Upload
muhammad-tariq
View
223
Download
0
Embed Size (px)
Citation preview
7/31/2019 Regression Equation
1/56
7/31/2019 Regression Equation
2/56
xbya
xxn
yxyxn
S
S
xVar
yxCovb
n
i
n
i ii
n
ii
n
ii
n
iii
x
xy
1
2
1
2
111
2)(
),(
Now for any fixed value of x, each observation in our sample satisfies the relation
iyixiy
Wherei is a random error representing the vertical deviation of the point from the population
regression line (parametric regression equation). From previous assumptions oni
y , i must
necessarily be a value of a random variable having a mean of zero and the variance2 . In terms of the
sample regression line, we can also write:
iii eyy An essential part of regression analysis involves the construction of confidence intervals for and and
testing hypothesis concerning these regression coefficients. The hypothesis for testing the coefficients
are =0 and =0. However the unknown variance 2
must be estimated from the data. An unbiased
estimate of 2
with n-2 degree of freedom, denoted by2
eS , is given by the formula:
2
)(
22
222
n
yy
n
e
n
SSES iiie
In usual the sample variance formula we use to take one degree of freedom, provide an unbiased
estimate of the population variance, since only is replaced by the sample mean in our calculations.
Here, it is necessary to take 2 degree of freedom in the formula for2eS because 2 degree of freedom are
lost by replacing and by a and b in our calculation of the syi ' .The simple formula for thecalculation of SSE are as follows:
)1(
)(
)1(
)(
))(1(
1 1
22
21 1
22
2
222
nn
yyn
andSnn
xxn
S
Where
SbSnSSE
n
i
n
i
ii
y
n
i
n
i
ii
x
xy
7/31/2019 Regression Equation
3/56
Test for Linearity of Regression Equation
OR
Validity of the Regression Model
We define the regression to be linear when all the means of y corresponding to each fall on a straight
line. One can always prefer a linear regression model over non linear model. We can test the linearity of
the regression equation by using the ANOVA test. If the linearity will be confirm than we can say that
regression model is valid and then we develop the model.
Calculation of ANOVA:
Values of x
50 55 65 70
Values of
corresponding to
each X
74.893 79.378 88.348 92.833
74.893 79.378 88.348 92.833
79.378 88.348 92.833
79.378
Sum 149.786 317.512 265.044 278.499 1010.841
Square of sum 22435.85 100813.9 70248.32 77561.69 1021800
Square of sum/ni 11217.92 25203.47 23416.11 25853.9 85149.96
Regression sum of square = (11217.92+25203.47+23416.11+25853.9)-85149.96
Regression sum of square = 541.69
Residual sum of square = SSE = 186.557 and 2eS = 18.656
ANOVAb
Model Sum of Squares df Mean Square F Sig.
1 Regression 541.693 1 541.693 29.036 .000a
Residual 186.557 10 18.656
Total 728.250 11
a. Predictors: (Constant), TestScore
b. Dependent Variable: CheScore
Here the significance value of 0.000 < 0.05 which means that the Ho of ANOVA is significant
means that the Ho reject.
7/31/2019 Regression Equation
4/56
Inference: Ho = all means are equal; which is rejected means Regression line is not
horizontal that implies that line with some slope and slope shows the correlation between the
predictor and the estimator.
Inferences Concerning the regression coefficients:Confidence interval for
A (1-)100% confidence interval for the parameter in the regression line is
)1()1(
1
2
21
2
2
nnS
xSt
annS
xSt
ax
n
iie
x
n
iie
Note that the symbol is being used here in two totally unrelated ways, first as the level ofsignificance and then as the intercept of the regression line.
Confidence interval for
A (1-)100% confidence interval for the parameter in the regression line is
)1()1(
22
nS
Stb
nS
Stb
x
e
x
e
Note that the symbol is being used here in two totally unrelated ways, first as the level ofsignificance and then as the intercept of the regression line.
Predictions
The equation may be used to predict the mean responseoxy
at x = xo, where, xo
is not necessarily one of the pre-chosen values, or it may be used to predict a single value of
the variable when . We would expect the error of the prediction to be higher in the
case of a single predicted value than in the case where a mean is predicted. This, then, will
affect the width of our confidence intervals for the values being predicted.
Predictions foroxy
A (1-) 100% confidence interval for the meanoxY
is given by:
2
2
22
2
2)1(
)(1
)1(
)(1
x
oeoxY
x
oeo
Sn
xx
nSty
Sn
xx
nSty
o
7/31/2019 Regression Equation
5/56
Predictions for oy
A (1-) 100% confidence interval for the single value oy when is given by:
2
2
22
2
2)1(
)(11)1(
)(11x
oeoo
x
oeoSnxx
nStyy
Snxx
nSty
What is the necessary condition for Simple linear regression?
The necessary condition for simple linear regression is that the test must be run between two
scale variable.
The variable must be correlated with each other.
How to run the test?
For the understanding we take the example from the book wall pole page no. 347.
In this example the two variables are IQ test score and Chemistry test score. Both are scale
measurement and theoretically they have correlation with each other.
7/31/2019 Regression Equation
6/56
7/31/2019 Regression Equation
7/56
7/31/2019 Regression Equation
8/56
Interpretation of output:
Descriptive Statistics
Mean Std. Deviation N
Chemistry test score 84.2500 8.13662 12
IQ test score 60.4167 7.82140 12
The descriptive of the variables
Correlations
Chemistry test
score IQ test score
Pearson Correlation Chemistry test score 1.000 .862
IQ test score .862 1.000
Sig. (1-tailed) Chemistry test score . .000
IQ test score .000 .
N Chemistry test score 12 12
IQ test score 12 12
The independent and dependent variable having correlation with each other therefore test can
be run.The significant value 0.000 which means that the test is significant. The hypothesis for the test
is there is no correlation between the two study variables which is rejected and test become
significant.
Model Summaryb
Model R R Square
Adjusted R
Square
Std. Error of the
Estimate
1 .862a
.744 .718 4.31923
a. Predictors: (Constant), IQ test score
b. Dependent Variable: Chemistry test score
(18.656)^0.5 = 4.319 (sum of the square of error term or residual
R-square = 0.744 this means 74.4% variation is explained by the predictors of the model.
7/31/2019 Regression Equation
9/56
ANOVAb
Model Sum of Squares df Mean Square F Sig.
1 Regression 541.693 (no. of predictors)1 541.693 29.036 .000a
Residual 186.557 10 18.656
Total 728.250 (N-no. of predictors)11
a. Predictors: (Constant), IQ test score
b. Dependent Variable: Chemistry test score
The value of F-statistic is 29.036 which is very high and the p-value (or the sig value) is 0.000which is less than 0.05 ( level of significance) this implies that the test of ANOVA is significant
and the model is valid from the given predictors. (See page # 365 for study)
Coefficientsa
Model
Unstandardized
Coefficients
Standardized
Coefficients
t Sig.
95.0% Confidence
Interval for B
Collinearity
Statistics
B Std. Error Beta
Lower
Bound
Upper
Bound Tolerance VIF
1 (Constant) 30.043 10.137 2.964 .014 7.458 52.629
IQ test
score
.897 .167 .862 5.389 .000 .526 1.268 1.000 1.000
a. Dependent Variable: Chemistry test score
1. t-values are calculated by taking the ratio between and the standard error. (e.g.30.043/10.137 = 2.964)
2. as far as standard error is increases the t-value is decreases and as t-value decreases the
significant value (p-value) will increases and if p-value become more than the level of
significance which is usually 0.05 than the predictor become insignificant or less
important for the model
3. Here the significance value (the p-value) is less than 0.14 and 0.000; both are less than
0.05 which means that the constant term as well as the coefficient of x both is significant
for model.4. Standardized Coefficients of Beta: it can be calculated by taking standardized values of
all the predictors and then run the test of regression analysis. In this context, whatever
be the value of be calculated that will be the Standardized Coefficients of Beta.
5. If there are more than one predictors, Standardized Coefficients of Beta will rank the
importance of the predictors. The bigger value will be more important predictor as
compare to the one which has the smaller value.
7/31/2019 Regression Equation
10/56
6. It shows that 95% confidence Interval which fall between 7.458 and 52.629. It means
researcher is 95% confident that minimum value for 30.043 may be 7.458 and maximum
may be 52.629. (See page no. 358 to 360 of wall Pole for further study).
7. As in this model which is simple linear regression model where only one predictor
therefore the explanation of tolerance and VIF cannot be explain well. It will discuss
when we discuss multiple regression model.
Variable which is save during run of the test is RES_1, which shows the residual value we can
check by taking the difference between predicted value (PRE_1) and actual (chemistryScore)
7/31/2019 Regression Equation
11/56
The variables 7, 8 and 9, 10 show the confidence interval of 95% for the predicted value of y at
specific value of x on the basis of mean of sample and on the individual basis. See page 361 to
363 of Wall Pole; for further understanding. Here this is important to understand that the range
of individual is more than mean. Because, it depends upon sample and its mean
Final Regression model
xY 897.0043.30 The predictor explaining 74.4% (R-Square = 0744) of the model.
7/31/2019 Regression Equation
12/56
Exponential regression equation
Or
Log TransformationIf a set of data appears to be best represented by a nonlinear regression curve, we must then try to
determine the form of the curve and estimate the parameters. Non linear regression curve means
the mean values of ys corresponding to each value of x are not fall on a straight line which
shows that the curve is nonlinear. In that situation we mostly apply exponential curve of the
form:
xxy
Where and are parameters to be estimated from the data. Denoting these estimates by c and
d, respectively, we can estimate xy by y from the sample regression curve.
xcdy
Taking log base 10 on both sides
xdcy )(logloglog
And each pair of observations in th sample satisfies the relation
iii
iii
ebxay
exdcy
log
)(logloglog
Where a = log c and b = log d. Therefore, it is possible to find a and b by the formulas discussed
above and then find c and d by taking the anti-log of the values.
Note: the log transformation is usually a good transformation technique for addressing the
nonlinearity of xy .
7/31/2019 Regression Equation
13/56
Multiple regression equationWhat is Multiple Regression Equation?
Multiple regression equation is a linear regression model with one independent and multiplenumbers of dependent variables. It is a dependence technique.
rrxxxY xxxxr .........332211....,2,1
Why is Multiple Regression Analysis?
Multiple regression analysis is a statistical technique that can be used to analyze the relationship
between a single dependent (criterion) variable and several independent (predictors) variables.
The objective of multiple regression analysis is to use the independent variables whose values
are known to predict the single dependent value selected by the researcher.
When Multiple Regression Analysis? It will carry out when all the variables are scale in measurement.
Some time, the variable with ordinal measurement may also use in MLR but without changing
the measurement.
In SLR can be effective with a sample size of 20, but in multiple regression requires a minimum
sample of 50 and preferably 100 observations for most research situations.
The minimum ratio of observations to variables is 5:1, but the preferred ratio is 15:1 or 20:1,
which should increase when stepwise estimation is used.
As the structure of y and error term are same therefore we study the structure of error
term instead ofy , because it is easy to study. For the study of MLR, the following assumptions about the error term should not violate:
Error term must hold the Normality Variables are Identical All predictors are Independent or lack of multi-collinearity or Independency Error term structure holds the linearity
7/31/2019 Regression Equation
14/56
Important terms to understand the discussion
7/31/2019 Regression Equation
15/56
How to perform Multiple Regression Analysis (MLR) on SPSS?
There are two steps involve in the analysis.
1. To run the test in SPSS
2. To understand and interpret the output of the test.
How to run the test in SPSS?
For running the test we consider the file ofcar_sales.sav from the sample files of system file.
7/31/2019 Regression Equation
16/56
In the beginning remove all the variables from variable no. 15 till end ( these are transformed or
standardized variables which is not useful during the initial level of MLR)
Make a correction in the measurement of the variable type from ordinal to nominal.
7/31/2019 Regression Equation
17/56
Out of 11 scale variables sales in thousand is dependent variable, while all other scale
variables (10 in number) are independents. In other words we can say that there are 10 predictors
which estimate the car sales.
Here we are using enter method.
7/31/2019 Regression Equation
18/56
Do continue and ok
7/31/2019 Regression Equation
19/56
How to understand the output of the test?
Descriptive Statistics
Mean Std. Deviation N
Sales in thousands 59.11232 75.058933 117
4-year resale value 18.03154 11.605632 117
Price in thousands 25.96949 14.149699 117
Engine size 3.049 1.0552 117
Horsepower 181.28 58.592 117
Wheelbase 107.326 8.0506 117
Width 71.190 3.5302 117
Length 187.718 13.8499 117
Curb weight 3.32405 .597177 117
Fuel capacity 17.813 3.7946 117
Fuel efficiency 24.12 4.404 117
Descriptive of all the variables
7/31/2019 Regression Equation
20/56
Correlations
Sales in
thousands
4-year
resale
value
Price in
thousands
Engine
size Horsepower Wheelbase Width Length
Curb
weight
Fuel
capacity
Fuel
efficiency
Pearso
n
Correlat
ion
Sales in
thousands
1.000 -.275 -.252 .038 -.153 .407 .178 .272 .067 .138 -.067
4-year resale
value
-.275 1.000 .955 .527 .773 -.054 .178 .025 .363 .325 -.399
Price in
thousands
-.252 .955 1.000 .649 .853 .067 .301 .183 .511 .406 -.480
Engine size .038 .527 .649 1.000 .862 .410 .672 .537 .743 .617 -.725
Horsepower -.153 .773 .853 .862 1.000 .226 .507 .401 .599 .480 -.596
Wheelbase .407 -.054 .067 .410 .226 1.000 .676 .854 .676 .659 -.471
Width .178 .178 .301 .672 .507 .676 1.00
0
.743 .736 .672 -.600
Length .272 .025 .183 .537 .401 .854 .743 1.000 .684 .563 -.466
Curb weight .067 .363 .511 .743 .599 .676 .736 .684 1.000 .848 -.819
Fuel capacity .138 .325 .406 .617 .480 .659 .672 .563 .848 1.000 -.809
Fuel efficiency -.067 -.399 -.480 -.725 -.596 -.471 -.600 -.466 -.819 -.809 1.000
Sig. (1-
tailed)
Sales in
thousands
. .001 .003 .342 .050 .000 .028 .001 .236 .069 .237
4-year resale
value
.001 . .000 .000 .000 .283 .027 .393 .000 .000 .000
Price in
thousands
.003 .000 . .000 .000 .236 .000 .024 .000 .000 .000
Engine size .342 .000 .000 . .000 .000 .000 .000 .000 .000 .000
Horsepower .050 .000 .000 .000 . .007 .000 .000 .000 .000 .000
Wheelbase .000 .283 .236 .000 .007 . .000 .000 .000 .000 .000
Width .028 .027 .000 .000 .000 .000 . .000 .000 .000 .000
Length .001 .393 .024 .000 .000 .000 .000 . .000 .000 .000
Curb weight .236 .000 .000 .000 .000 .000 .000 .000 . .000 .000
Fuel capacity .069 .000 .000 .000 .000 .000 .000 .000 .000 . .000
Fuel efficiency .237 .000 .000 .000 .000 .000 .000 .000 .000 .000 .
7/31/2019 Regression Equation
21/56
The first column shows the correlation between the dependent variable sales in thousands and
other independent variables. So high correlation in this column and low level of significant (p-
value) shows that those variables are important for the model otherwise the variable is
insignificant and less important for the model. On the other hand the high correlation in the other
columns shows that the independent variables are highly correlated with each other. This shows
the property ofmulticollinearity is exist between the independent variables. Multicollineartity
violate the assumption of the independencyamong predictors. The advantage which one can
observe that, when we remove the variable which has the multicollineartity than the R-square of
the model will increase which is also a good sign for the model. One of the solutions of this
problem is to make the factors or components by factor analysis before the regression analysis.
Model Summaryb
Model R R Square
Adjusted R
Square
Std. Error of the
Estimate
1 .565a
.319 .255 64.798014
a. Predictors: (Constant), Fuel efficiency, 4-year resale value, Length,
Width, Engine size, Fuel capacity, Wheelbase, Curb weight,
Horsepower, Price in thousands
b. Dependent Variable: Sales in thousands
R-Square (Coefficient of determination) is equals to 0.319 or 31.9%. It means that all the 10
predictors explain the 31.9% of the variation in the estimators.
ANOVAb
Model Sum of Squares df Mean Square F Sig.
1 Regression 208454.878 10 20845.488 4.965 .000a
Residual 445070.963 106 4198.783
Total 653525.841 116
a. Predictors: (Constant), Fuel efficiency, 4-year resale value, Length, Width, Engine size, Fuel
capacity, Wheelbase, Curb weight, Horsepower, Price in thousands
b. Dependent Variable: Sales in thousands
The test of ANOVA is significant which show that the model will be valid.
7/31/2019 Regression Equation
22/56
Coefficientsa
Model
Unstandardized
Coefficients
Standardized
Coefficients
t Sig.
95.0% Confidence
Interval for B
Collinearity
Statistics
B Std. Error Beta
Lower
Bound
Upper
Bound Tolerance VIF
1 (Constant) -225.116 191.901 -1.173 .243 -605.579 155.347
4-year resale
value
.011 2.253 .002 .005 .996 -4.456 4.478 .053 18.890
Price in
thousands
-.015 2.146 -.003 -.007 .994 -4.269 4.239 .039 25.464
Engine size 37.640 15.588 .529 2.415 .017 6.735 68.545 .134 7.474
Horsepower -.612 .342 -.477 -1.788 .077 -1.290 .067 .090 11.095
Wheelbase 6.391 1.787 .685 3.576 .001 2.848 9.934 .175 5.718
Width -.375 3.120 -.018 -.120 .905 -6.561 5.811 .298 3.352
Length -.437 1.077 -.081 -.406 .686 -2.573 1.698 .163 6.149
Curb weight -69.476 29.602 -.553 -2.347 .021 -128.165 -10.787 .116 8.633
Fuel capacity -.157 3.697 -.008 -.043 .966 -7.487 7.172 .184 5.437
Fuel efficiency -2.608 2.931 -.153 -.890 .376 -8.418 3.203 .217 4.602
a. Dependent Variable: Sales in thousands
All such predictors which are insignificant should remove one by one from the model start from
the highest significant valued variable.
NOTE:
One should advise to study the residual structure and apply the appropriate transformation before
the removal of the variable. The reason of this practice is just to find out the true significant
variables otherwise it is quite possible that during the process of enter and remove one can
remove the significant variable.
Study of the residual Structure:
Usually we perform the analysis of residual structure by drawing the graph (scatter plot) between
unstandardized predicted value and standardized residual value (the variables unstandardizedpredicted value and standardized residual value generates when researcher runs the testconditionally when we give the option of save during the running the test)
7/31/2019 Regression Equation
23/56
If the predicted values are very large than residuals will also be very large, and as a result
analysis of residual structure becomes difficult. Therefore we use standardized or studentized
residual values. The Standardized residual (taking on y-axis or dependent) and un-standardized
predicted value (taking on x-axis or independent)
7/31/2019 Regression Equation
24/56
Do OK
The graph shows that the variation between the two variables initially less but later on it will
increase. Compare with the following set of graph.
7/31/2019 Regression Equation
25/56
Here the graph which we form shows the resemblance with the graph of C. as shown below
Graph-1
7/31/2019 Regression Equation
26/56
It means that the graph has the heteroscedasticity. Heteroscedasticity means that the variance
pattern between the values of y for the different predictors is not same. So transformation
requires lifting the assumption violation.
Method of Transformations:
When Data Transformation require?
The data transformation require when the study of residual structure tells us that it is not
following or violating the assumptions of the model. The assumptions are listed below:
Linearity of the phenomenon measured (through scatter plot of the error term). Constant variance of the error terms (Homoscedasticity). Independence of the error term (multicollineartity). All predictors are
Independent. Normality of the error term distribution.
Why Data Transformation require?Data transformation gives us two benefits:
1. Two correct the violations of the statistical assumption mentioned above for themultivariate technique.
2. To improve the relationship between the study variable and the predictors.3. Transformation shows the correct picture of the significant variables. Sometime we may
find such few variables which are insignificant but after transformation it deduces that thevariable was actually significant.
How to do Data Transformation?Criterion for the transformation
FigureViolation of assumption
Situation of
ViolationRemedy
a Null Plot All the
assumptions of
the model are
met.
No remedy is required.
b Non Normality Flat pattern Inverse transformation
(1/y)
Negatively
Skewed
Squared or Cubed (y2 or
y3)
Positively
Skewed
Square root or log
transformation (y or lny)
C Heteroscedasticity Cone shape
distribution
Inverse transformation
(1/y)
7/31/2019 Regression Equation
27/56
opens rightward
Opens leftward Square root
transformation (y)D,H Heteroscedasticity A diamond shape
pattern shows
that the highvariance in the
middle while at
the end the
variance are less.
It means it is
nonlinearity is
also exist.
Usually log
transformation is
advisable. It is oftenhappened that when we
address one of the
violation the other will
also adjusted
simultaneously.
E Time base dependence
F Event base dependence
G NormalNumerous procedures are available for achieving linearity between two variables but most
simple nonlinear relationships can be placed in one of four categories in the figure below. If the
relationship looks like figure a, then either variable can be squared to achieve linearity. Whenmultiple transformation possibilities are shown, start with the top method in each figure and then
move downward until linearity is achieved.
7/31/2019 Regression Equation
28/56
How to perform transformation:
Stepwise transformation are shown below.
7/31/2019 Regression Equation
29/56
The dependent variable sales has been transformed as variable name TransformedSales
Now see the effect of transformation shown below.
Before seeing the effect of transformation, run the regression test again by taking the dependent
variable transformedSales. After that draw the scatter plot of Standardized residual and
Unstandardized predicted value.
7/31/2019 Regression Equation
30/56
Result of transformation
Before Transformation After transformation
Model Summaryb
Mod
el R
R
Square
Adjusted R
Square
Std. Error
of the
Estimate
1 .565a
.319 .255 64.798014
a. Predictors: (Constant), Fuel efficiency, 4-year
resale value, Length, Width, Engine size, Fuel
capacity, Wheelbase, Curb weight, Horsepower,
Price in thousands
b. Dependent Variable: Sales in thousands
Model Summaryb
Mode
l R
R
Square
Adjusted R
Square
Std. Error of
the Estimate
1 .634a
.402 .345 1.08253
a. Predictors: (Constant), Fuel efficiency, 4-year
resale value, Length, Width, Engine size, Fuel
capacity, Wheelbase, Curb weight, Horsepower,
Price in thousands
b. Dependent Variable: TransformedSales
7/31/2019 Regression Equation
31/56
Conclusion:
1. R-Square improved from 31.9% to 40.2%
2. The scatter plot also show that this is looks like the null plot.
Enter and Remove the Predictors (Before transformation)
Here we perform the removal of insignificant variable from the model one by one we perform
this exercise without the transformation. The removal of the variable will take place in
descending order on the basis of significant values. The variable which has the highest sig value
will remove first continue this procedure one by one.
Note: It is quite possible that the researcher can think about any particular variable as an
important factor but statistical may not support his/her idea. It does not matter, if researcher
thinks to include that variable to be a part of the model so it can be.
0. Without removing any variable (Consider all predictors)
Model Summary
Model R R Square
Adjusted R
Square
Std. Error of the
Estimate
1 .565a
.319 .255 64.798014
a. Predictors: (Constant), Fuel efficiency, 4-year resale value, Length,
Width, Engine size, Fuel capacity, Wheelbase, Curb weight,
Horsepower, Price in thousands
7/31/2019 Regression Equation
32/56
7/31/2019 Regression Equation
33/56
Model
Unstandardized
Coefficients
Standardized
Coefficients
t Sig.
95.0% Confidence Interval
for B
B Std. Error Beta Lower Bound Upper Bound
1 (Constant) -214.894 152.450 -1.410 .161 -516.258 86.471
Price in
thousands
-.441 .731 -.092 -.604 .547 -1.886 1.004
Engine size 34.469 11.271 .525 3.058 .003 12.189 56.750
Horsepower -.563 .250 -.464 -2.254 .026 -1.056 -.069
Wheelbase 4.729 1.317 .529 3.590 .000 2.125 7.333
Width -.222 2.428 -.011 -.092 .927 -5.022 4.577
Length -.129 .748 -.025 -.173 .863 -1.608 1.349
Curb weight -50.079 19.946 -.462 -2.511 .013 -89.508 -10.649
Fuel capacity .460 2.719 .026 .169 .866 -4.916 5.835
Fuel efficiency -1.172 2.266 -.073 -.517 .606 -5.653 3.308
a. Dependent Variable: Sales in thousands
After the removal of 4-year resale value the significance value of the variables are addressed
few are increase and few are decrease. The variable horse power which was initially
insignificant, but now become significant.
2. Remove variable width
Model Summaryb
Model R R Square
Adjusted R
Square
Std. Error of the
Estimate
1 .560a
.314 .275 58.690836
a. Predictors: (Constant), Fuel efficiency, Length, Price in thousands,
Fuel capacity, Engine size, Wheelbase, Curb weight, Horsepower
b. Dependent Variable: Sales in thousands
This time there is no change in the value of R-square.
Coefficientsa
Model
Unstandardized
Coefficients
Standardized
Coefficients t Sig.
95.0% Confidence Interval
for B
7/31/2019 Regression Equation
34/56
B Std. Error Beta Lower Bound Upper Bound
1 (Constant) -224.403 111.287 -2.016 .046 -444.383 -4.423
Price in
thousands
-.433 .722 -.090 -.599 .550 -1.860 .995
Engine size 34.277 11.036 .522 3.106 .002 12.463 56.091
Horsepower -.565 .248 -.466 -2.283 .024 -1.054 -.076
Wheelbase 4.712 1.299 .527 3.628 .000 2.144 7.279
Length -.141 .734 -.028 -.192 .848 -1.592 1.310
Curb weight -50.374 19.616 -.465 -2.568 .011 -89.148 -11.600
Fuel capacity .447 2.706 .026 .165 .869 -4.902 5.796
Fuel efficiency -1.181 2.257 -.074 -.523 .602 -5.642 3.280
a. Dependent Variable: Sales in thousands
After this run only 4 variables are now left as insignificant while the constant becomes
significant itself.
3. Remove variable Fuel Capacity
Model Summaryb
Model R R Square
Adjusted R
Square
Std. Error of the
Estimate
1 .560a .313 .280 58.492268
a. Predictors: (Constant), Fuel efficiency, Length, Price in thousands,
Engine size, Wheelbase, Curb weight, Horsepower
b. Dependent Variable: Sales in thousands
Again R-square is dropped slightly i.e. only 0.001
Coefficientsa
Model
Unstandardized
Coefficients
Standardized
Coefficients
t Sig.
95.0% Confidence Interval
for B
B Std. Error Beta Lower Bound Upper Bound
1 (Constant) -221.857 109.840 -2.020 .045 -438.963 -4.750
Price in
thousands
-.423 .718 -.089 -.590 .556 -1.842 .995
Engine size 34.347 10.990 .523 3.125 .002 12.624 56.070
7/31/2019 Regression Equation
35/56
Horsepower -.569 .245 -.469 -2.320 .022 -1.054 -.084
Wheelbase 4.767 1.250 .534 3.815 .000 2.297 7.238
Length -.151 .729 -.030 -.207 .836 -1.592 1.290
Curb weight -48.971 17.621 -.452 -2.779 .006 -83.801 -14.142
Fuel efficiency -1.310 2.111 -.082 -.620 .536 -5.482 2.863
a. Dependent Variable: Sales in thousands
4. Remove variable Length
Model Summaryb
Model R R Square
Adjusted R
Square
Std. Error of the
Estimate
1 .560
a
.313 .285 58.298917
a. Predictors: (Constant), Fuel efficiency, Price in thousands,
Wheelbase, Engine size, Curb weight, Horsepower
b. Dependent Variable: Sales in thousands
No change in the value of R-square
Coefficientsa
Model
Unstandardized
Coefficients
Standardized
Coefficients
t Sig.
95.0% Confidence Interval
for B
B Std. Error Beta Lower Bound Upper Bound
1 (Constant) -223.549 109.174 -2.048 .042 -439.327 -7.771
Price in
thousands
-.384 .690 -.080 -.556 .579 -1.747 .979
Engine size 34.172 10.922 .520 3.129 .002 12.586 55.759
Horsepower -.582 .237 -.480 -2.455 .015 -1.050 -.113
Wheelbase 4.585 .884 .513 5.189 .000 2.839 6.331
Curb weight -49.726 17.184 -.459 -2.894 .004 -83.689 -15.762
Fuel efficiency -1.421 2.035 -.089 -.698 .486 -5.443 2.602
a. Dependent Variable: Sales in thousands
5. Remove variable Price in thousand
Model Summaryb
7/31/2019 Regression Equation
36/56
Model R R Square
Adjusted R
Square
Std. Error of the
Estimate
1 .560a
.313 .290 57.965284
a. Predictors: (Constant), Fuel efficiency, Wheelbase, Horsepower,
Curb weight, Engine size
b. Dependent Variable: Sales in thousands
No change in the value of R-square
Coefficientsa
Model
Unstandardized
Coefficients
Standardized
Coefficients
t Sig.
95.0% Confidence Interval for
B
B Std. Error Beta Lower Bound Upper Bound
1 (Constant) -224.038 108.178 -2.071 .040 -437.823 -10.253
Engine size 36.113 10.310 .549 3.503 .001 15.737 56.489
Horsepower -.682 .156 -.563 -4.382 .000 -.990 -.374
Wheelbase 4.746 .830 .531 5.715 .000 3.105 6.386
Curb weight -53.227 15.946 -.491 -3.338 .001 -84.739 -21.715
Fuel efficiency -1.541 2.006 -.096 -.768 .444 -5.506 2.424
a. Dependent Variable: Sales in thousands
6. Remove variable Fuel efficiency
Model Summaryb
Model R R Square
Adjusted R
Square
Std. Error of the
Estimate
1 .556a
.309 .291 57.595036
a. Predictors: (Constant), Curb weight, Horsepower, Wheelbase,
Engine size
b. Dependent Variable: Sales in thousands
This time R-square dropped by 0.004 unit but it is still predictors explain the estimator by 30.9%.
Coefficientsa
Model
Unstandardized
Coefficients
Standardized
Coefficients
t Sig.
95.0% Confidence Interval for
B
B Std. Error Beta Lower Bound Upper Bound
7/31/2019 Regression Equation
37/56
1 (Constant) -288.879 73.316 -3.940 .000 -433.745 -144.012
Engine size 36.856 9.985 .561 3.691 .000 17.127 56.585
Horsepower -.669 .154 -.552 -4.354 .000 -.973 -.365
Wheelbase 4.747 .820 .531 5.787 .000 3.126 6.367
Curb weight -46.288 13.208 -.427 -3.505 .001 -72.386 -20.190
a. Dependent Variable: Sales in thousands
Now only significant variables are left.
Enter and Remove the Predictors (After Transformation)
0. First output with considering all the variables.
Model Summaryb
Model R R Square
Adjusted R
Square
Std. Error of the
Estimate
1 .634a
.402 .345 1.08253
a. Predictors: (Constant), Fuel efficiency, 4-year resale value, Length,
Width, Engine size, Fuel capacity, Wheelbase, Curb weight,
Horsepower, Price in thousands
b. Dependent Variable: LogSale
Coefficientsa
Model
Unstandardized
Coefficients
Standardized
Coefficients
t Sig.
95.0% Confidence
Interval for B
Collinearity
Statistics
B Std. Error Beta
Lower
Bound
Upper
Bound Tolerance VIF
1 (Constant) -1.097 3.206 -.342 .733 -7.453 5.259
4-year resale
value
-.012 .038 -.101 -.311 .757 -.086 .063 .053 18.890
Price in
thousands
-.036 .036 -.385 -1.015 .312 -.107 .035 .039 25.464
Engine size .310 .260 .244 1.190 .237 -.206 .826 .134 7.474
Horsepower -.003 .006 -.118 -.470 .639 -.014 .009 .090 11.095
Wheelbase .093 .030 .559 3.111 .002 .034 .152 .175 5.718
Width -.026 .052 -.068 -.492 .624 -.129 .078 .298 3.352
7/31/2019 Regression Equation
38/56
Length -.018 .018 -.188 -1.008 .316 -.054 .018 .163 6.149
Curb weight .262 .495 .117 .530 .597 -.718 1.242 .116 8.633
Fuel capacity -.059 .062 -.166 -.949 .345 -.181 .064 .184 5.437
Fuel efficiency .026 .049 .087 .538 .592 -.071 .123 .217 4.602
a. Dependent Variable: LogSale
1. Remove variable 4-year resale value
The variable 4-year resale value having the heighest sig value i.e. 0.757 so remove it first.
Model Summaryb
Model R R Square
Adjusted R
Square
Std. Error of the
Estimate
1 .678a
.459 .425 1.01072
a. Predictors: (Constant), Fuel efficiency, Length, Price in thousands,
Width, Fuel capacity, Engine size, Wheelbase, Curb weight,
Horsepower
b. Dependent Variable: LogSale
Coefficientsa
Model
Unstandardized
Coefficients
Standardized
Coefficients
t Sig.
95.0% Confidence
Interval for B
Collinearity
Statistics
B Std. Error Beta
Lower
Bound
Upper
Bound Tolerance VIF
1 (Constant) -.414 2.616 -.158 .874 -5.586 4.758
Price in
thousands
-.057 .013 -.616 -4.542 .000 -.082 -.032 .207 4.835
Engine size .408 .193 .321 2.110 .037 .026 .791 .164 6.095
Horsepower -.004 .004 -.151 -.828 .409 -.012 .005 .114 8.760
Wheelbase .061 .023 .355 2.711 .008 .017 .106 .222 4.499
Width -.049 .042 -.127 -1.175 .242 -.131 .033 .325 3.080
Length -.003 .013 -.028 -.217 .828 -.028 .023 .226 4.418
Curb weight .424 .342 .202 1.238 .218 -.253 1.100 .142 7.019
Fuel capacity -.026 .047 -.077 -.560 .576 -.118 .066 .200 4.991
Fuel efficiency .047 .039 .153 1.218 .225 -.030 .124 .241 4.144
7/31/2019 Regression Equation
39/56
7/31/2019 Regression Equation
40/56
Coefficientsa,b
Model
Unstandardized
Coefficients
Standardized
Coefficients
t Sig.
95.0% Confidence
Interval for B
Collinearity
Statistics
B Std. Error Beta
Lower
Bound
Upper
Bound Tolerance VIF
1 Price in
thousands
-.057 .012 -.498 -4.603 .000 -.082 -.033 .045 22.048
Engine size .413 .190 .376 2.174 .031 .038 .789 .018 56.259
Horsepower -.004 .004 -.194 -.836 .405 -.012 .005 .010 101.939
Wheelbase .061 .022 1.848 2.720 .007 .017 .105 .001 870.319
Width -.053 .030 -1.072 -1.757 .081 -.114 .007 .001 702.034
Length -.003 .013 -.141 -.209 .834 -.028 .023 .001 859.164
Curb weight .424 .341 .410 1.241 .216 -.251 1.098 .005 205.739
Fuel capacity -.027 .046 -.138 -.574 .567 -.118 .065 .009 109.013
Fuel efficiency .044 .034 .304 1.299 .196 -.023 .112 .010 102.955
a. Dependent Variable: LogSale
b. Linear Regression through the Origin
3. Remove variable Length
Model Summary
c,d
Model R R Squareb
Adjusted R
Square
Std. Error of the
Estimate
1 .961a
.924 .920 1.00392
a. Predictors: Fuel efficiency, Price in thousands, Engine size, Fuel
capacity, Horsepower, Curb weight, Wheelbase, Width
b. For regression through the origin (the no-intercept model), R Square
measures the proportion of the variability in the dependent variable about
the origin explained by regression. This CANNOT be compared to R
Square for models which include an intercept.
c. Dependent Variable: LogSale
d. Linear Regression through the Origin
Coefficientsa,b
7/31/2019 Regression Equation
41/56
Model
Unstandardized
Coefficients
Standardized
Coefficients
t Sig.
95.0% Confidence
Interval for B
Collinearity
Statistics
B Std. Error Beta
Lower
Bound
Upper
Bound Tolerance VIF
1 Price in
thousands
-.057 .012 -.492 -4.703 .000 -.080 -.033 .048 20.811
Engine size .411 .189 .374 2.173 .031 .037 .785 .018 56.089
Horsepower -.004 .004 -.205 -.904 .367 -.012 .004 .010 97.349
Wheelbase .058 .017 1.755 3.432 .001 .025 .091 .002 495.998
Width -.055 .030 -1.096 -1.833 .069 -.114 .004 .001 678.380
Curb weight .410 .334 .397 1.228 .221 -.250 1.070 .005 198.513
Fuel capacity -.026 .046 -.133 -.558 .578 -.117 .065 .009 108.047
Fuel efficiency .043 .033 .294 1.288 .200 -.023 .109 .010 99.044
a. Dependent Variable: LogSale
b. Linear Regression through the Origin
4. Remove variable Fuel Capacity
Model Summaryc,d
Model R R Squareb
Adjusted RSquare
Std. Error of theEstimate
1 .961a
.924 .920 1.00154
a. Predictors: Fuel efficiency, Price in thousands, Engine size, Curb
weight, Horsepower, Wheelbase, Width
b. For regression through the origin (the no-intercept model), R Square
measures the proportion of the variability in the dependent variable about
the origin explained by regression. This CANNOT be compared to R
Square for models which include an intercept.
c. Dependent Variable: LogSale
d. Linear Regression through the Origin
Coefficientsa,b
Model
Unstandardized
Coefficients
Standardized
Coefficients t Sig.
95.0% Confidence
Interval for B
Collinearity
Statistics
7/31/2019 Regression Equation
42/56
B Std. Error Beta
Lower
Bound
Upper
Bound Tolerance VIF
1 Price in
thousands
-.057 .012 -.499 -4.814 .000 -.081 -.034 .049 20.521
Engine size .410 .189 .372 2.172 .031 .037 .783 .018 56.081
Horsepower -.003 .004 -.188 -.841 .402 -.012 .005 .010 95.713
Wheelbase .055 .016 1.680 3.413 .001 .023 .087 .002 461.716
Width -.057 .030 -1.135 -1.916 .057 -.115 .002 .001 669.012
Curb weight .334 .304 .323 1.098 .274 -.267 .934 .006 164.991
Fuel efficiency .050 .031 .343 1.632 .105 -.011 .111 .012 84.344
a. Dependent Variable: LogSale
b. Linear Regression through the Origin
5. Remove variable Horse Power
Model Summaryc,d
Model R R Squareb
Adjusted R
Square
Std. Error of the
Estimate
1 .961a
.924 .920 1.00053
a. Predictors: Fuel efficiency, Price in thousands, Engine size, Curbweight, Wheelbase, Width
b. For regression through the origin (the no-intercept model), R Square
measures the proportion of the variability in the dependent variable about
the origin explained by regression. This CANNOT be compared to R
Square for models which include an intercept.
c. Dependent Variable: LogSale
d. Linear Regression through the Origin
Coefficients
a,b
Model
Unstandardized
Coefficients
Standardized
Coefficients
t Sig.
95.0% Confidence
Interval for B
Collinearity
Statistics
B Std. Error Beta
Lower
Bound
Upper
Bound Tolerance VIF
7/31/2019 Regression Equation
43/56
1 Price in
thousands
-.065 .008 -.566 -8.443 .000 -.080 -.050 .117 8.584
Engine size .308 .145 .280 2.128 .035 .022 .595 .030 33.125
Wheelbase .055 .016 1.662 3.383 .001 .023 .087 .002 460.831
Width -.062 .029 -1.235 -2.132 .035 -.119 -.004 .002 641.711
Curb weight .403 .292 .390 1.381 .169 -.174 .980 .007 152.731
Fuel efficiency .053 .031 .362 1.732 .085 -.007 .113 .012 83.408
a. Dependent Variable: LogSale
b. Linear Regression through the Origin
6. Remove variable Curb weight
Model Summaryc,d
Model R R Squareb
Adjusted R
Square
Std. Error of the
Estimate
1 .961a
.923 .920 1.00022
a. Predictors: Fuel efficiency, Price in thousands, Engine size,
Wheelbase, Width
b. For regression through the origin (the no-intercept model), R Square
measures the proportion of the variability in the dependent variable about
the origin explained by regression. This CANNOT be compared to R
Square for models which include an intercept.
c. Dependent Variable: LogSale
d. Linear Regression through the Origin
Coefficientsa,b
Model
Unstandardized
Coefficients
Standardized
Coefficients
t Sig.
95.0% Confidence
Interval for B
Collinearity
Statistics
B Std. Error Beta
Lower
Bound
Upper
Bound Tolerance VIF
1 Price in
thousands
-.062 .007 -.542 -8.407 .000 -.077 -.047 .125 7.975
Engine size .332 .143 .303 2.317 .022 .049 .615 .030 32.794
Wheelbase .063 .015 1.930 4.265 .000 .034 .093 .003 393.006
Width -.049 .027 -.989 -1.793 .075 -.103 .005 .002 583.801
Fuel efficiency .027 .024 .186 1.125 .262 -.021 .075 .019 52.739
7/31/2019 Regression Equation
44/56
a. Dependent Variable: LogSale
b. Linear Regression through the Origin
7. Remove variable Fuel efficiency
8. Remove variable WidthModel Summary
c,d
Model R R Squareb
Adjusted R
Square
Std. Error of the
Estimate
1 .960a
.922 .921 .99940
a. Predictors: Wheelbase, Price in thousands, Engine size
Model Summaryc,d
Model R R Squareb
Adjusted R
Square
Std. Error of the
Estimate
1 .961a
.923 .921 .99628
a. Predictors: Width, Price in thousands, Engine size, Wheelbase
b. For regression through the origin (the no-intercept model), R Square
measures the proportion of the variability in the dependent variable about
the origin explained by regression. This CANNOT be compared to R
Square for models which include an intercept.
c. Dependent Variable: LogSale
d. Linear Regression through the Origin
Coefficientsa,b
Model
Unstandardized
Coefficients
Standardized
Coefficients
t Sig.
95.0% Confidence
Interval for B
Collinearity
Statistics
B Std. Error Beta
Lower
Bound
Upper
Bound Tolerance VIF
1 Price in
thousands
-.062 .007 -.541 -8.480 .000 -.077 -.048 .125 8.005
Engine size .223 .106 .203 2.094 .038 .013 .432 .054 18.427
Wheelbase .061 .015 1.849 4.161 .000 .032 .090 .003 388.695
Width -.031 .022 -.630 -1.398 .164 -.076 .013 .003 399.409
a. Dependent Variable: LogSale
b. Linear Regression through the Origin
7/31/2019 Regression Equation
45/56
b. For regression through the origin (the no-intercept model), R Square
measures the proportion of the variability in the dependent variable about
the origin explained by regression. This CANNOT be compared to R
Square for models which include an intercept.
c. Dependent Variable: LogSale
d. Linear Regression through the Origin
Coefficientsa,b
Model
Unstandardized
Coefficients
Standardized
Coefficients
t Sig.
95.0% Confidence
Interval for B
Collinearity
Statistics
B Std. Error Beta
Lower
Bound
Upper
Bound Tolerance VIF
1 Price in
thousands
-.064 .007 -.559 -8.913 .000 -.078 -.050 .130 7.684
Engine size .221 .107 .202 2.078 .039 .011 .432 .054 18.426
Wheelbase .041 .003 1.237 16.255 .000 .036 .046 .088 11.328
a. Dependent Variable: LogSale
b. Linear Regression through the Origin
Conclusion
After all these exercise we conclude that there are three predictors; (1) price in thousand, (2)
engine size and (3) wheel base are the significant predictors but the VIF of engine size is veryhigh that is 18.426 so it shows the effect of multicollinearity exist here although it is significant
coefficient. So it depends upon the researcher and theory, whether to include or not to include
this variable in the regression model. On the other hand, when the test was run before the
transformation we find the other four variables with constant term was (1) engine size (2) horse
power (3) wheel base and (4) curb weight. After transformation we get theoretically more logical
predictors as compare to the result of before transformation while the value of R-Square is also
much better than before.
Finally the regression equation will be:
Logsale = - 0.064 (price in thousands) + 0.041(Wheelbase)
Enter & Remove by Backward Method
The same test and approximately the same result can calculate by using the backward method.
We validate the result of before and after transformation by backward method.
7/31/2019 Regression Equation
46/56
How to perform Backward Method? (Without transformation)
7/31/2019 Regression Equation
47/56
Variables Entered/Removedb
Model
Variables
Entered
Variables
Removed Method
1 Fuel efficiency,
4-year resale
value, Length,
Width, Engine
size, Fuel
capacity,
Wheelbase, Curb
weight,
Horsepower,
Price in
thousandsa
. Enter
2 . 4-year resale
value
Backward
(criterion:
Probability of F-
to-remove >=
.100).
3 . Price in
thousands
Backward
(criterion:
Probability of F-
to-remove >=
.100).
4 . Fuel capacity Backward
(criterion:
Probability of F-
to-remove >=
.100).
5 . Width Backward
(criterion:
Probability of F-
to-remove >=
.100).
7/31/2019 Regression Equation
48/56
6 . Length Backward
(criterion:
Probability of F-
to-remove >=
.100).
7 . Fuel efficiency Backward
(criterion:
Probability of F-
to-remove >=
.100).
a. All requested variables entered.
b. Dependent Variable: Sales in thousands
Model Summary
Model R R Square
Adjusted R
Square
Std. Error of the
Estimate
1 .565a
.319 .255 64.798014
2 .565b
.319 .262 64.494517
3 .565c
.319 .269 64.195246
4 .565d
.319 .275 63.900650
5 .565e
.319 .282 63.614616
6 .563f
.317 .287 63.398826
7 .556g
.309 .285 63.486236
a. Predictors: (Constant), Fuel efficiency, 4-year resale value, Length,
Width, Engine size, Fuel capacity, Wheelbase, Curb weight,
Horsepower, Price in thousands
b. Predictors: (Constant), Fuel efficiency, Length, Width, Engine size,
Fuel capacity, Wheelbase, Curb weight, Horsepower, Price in
thousands
c. Predictors: (Constant), Fuel efficiency, Length, Width, Engine size,
Fuel capacity, Wheelbase, Curb weight, Horsepower
d. Predictors: (Constant), Fuel efficiency, Length, Width, Engine size,
Wheelbase, Curb weight, Horsepower
e. Predictors: (Constant), Fuel efficiency, Length, Engine size,
Wheelbase, Curb weight, Horsepower
7/31/2019 Regression Equation
49/56
f. Predictors: (Constant), Fuel efficiency, Engine size, Wheelbase, Curb
weight, Horsepower
g. Predictors: (Constant), Engine size, Wheelbase, Curb weight,
Horsepower
Coefficientsa
Model
Unstandardized Coefficients
Standardized
Coefficients
t Sig.B Std. Error Beta
1 (Constant) -225.116 191.901 -1.173 .243
4-year resale value .011 2.253 .002 .005 .996
Price in thousands -.015 2.146 -.003 -.007 .994
Engine size 37.640 15.588 .529 2.415 .017
Horsepower -.612 .342 -.477 -1.788 .077
Wheelbase 6.391 1.787 .685 3.576 .001
Width -.375 3.120 -.018 -.120 .905
Length -.437 1.077 -.081 -.406 .686
Curb weight -69.476 29.602 -.553 -2.347 .021
Fuel capacity -.157 3.697 -.008 -.043 .966
Fuel efficiency -2.608 2.931 -.153 -.890 .376
2 (Constant) -224.919 186.969 -1.203 .232
Price in thousands -.006 .991 -.001 -.006 .996
Engine size 37.627 15.300 .529 2.459 .016
Horsepower -.611 .339 -.477 -1.801 .074
Wheelbase 6.392 1.769 .686 3.613 .000
Width -.374 3.099 -.018 -.121 .904
Length -.438 1.046 -.081 -.419 .676
Curb weight -69.529 27.528 -.553 -2.526 .013
Fuel capacity -.154 3.619 -.008 -.043 .966
Fuel efficiency -2.610 2.878 -.153 -.907 .367
3 (Constant) -225.048 184.706 -1.218 .226
7/31/2019 Regression Equation
50/56
Engine size 37.654 14.496 .529 2.597 .011
Horsepower -.613 .213 -.478 -2.881 .005
Wheelbase 6.392 1.758 .686 3.636 .000
Width -.371 3.057 -.017 -.121 .904
Length -.437 1.019 -.081 -.429 .669
Curb weight -69.586 25.500 -.554 -2.729 .007
Fuel capacity -.156 3.590 -.008 -.043 .966
Fuel efficiency -2.613 2.827 -.153 -.924 .357
4 (Constant) -225.414 183.665 -1.227 .222
Engine size 37.728 14.328 .530 2.633 .010
Horsepower -.614 .211 -.479 -2.906 .004
Wheelbase 6.364 1.619 .683 3.929 .000
Width -.395 2.993 -.019 -.132 .895
Length -.424 .967 -.078 -.438 .662
Curb weight -70.015 23.394 -.557 -2.993 .003
Fuel efficiency -2.561 2.557 -.150 -1.002 .319
5 (Constant) -242.535 129.494 -1.873 .064
Engine size 37.238 13.775 .523 2.703 .008
Horsepower -.611 .209 -.477 -2.919 .004
Wheelbase 6.346 1.607 .681 3.949 .000
Length -.460 .924 -.085 -.498 .619
Curb weight -70.466 23.039 -.561 -3.059 .003
Fuel efficiency -2.554 2.545 -.150 -1.004 .318
6 (Constant) -247.393 128.688 -1.922 .057
Engine size 36.371 13.619 .511 2.671 .009
Horsepower -.626 .207 -.489 -3.031 .003
Wheelbase 5.745 1.057 .616 5.436 .000
Curb weight -71.956 22.767 -.572 -3.161 .002
Fuel efficiency -2.831 2.475 -.166 -1.144 .255
7 (Constant) -353.714 89.146 -3.968 .000
Engine size 39.752 13.313 .559 2.986 .003
Horsepower -.638 .207 -.498 -3.087 .003
7/31/2019 Regression Equation
51/56
Wheelbase 5.556 1.045 .596 5.315 .000
Curb weight -56.888 18.597 -.453 -3.059 .003
a. Dependent Variable: Sales in thousands
At the end we find the same significant variables which we find in enter and remove method
with approximately the same level of significant values.
How to perform Backward Method? (After transformation)
7/31/2019 Regression Equation
52/56
Variables Entered/Removedb
Model Variables Entered
Variables
Removed Method
1 Fuel efficiency, 4-year resale value, Length,
Width, Engine size, Fuel capacity, Wheelbase,
Curb weight, Horsepower, Price in thousandsa
. Enter
2 . 4-year resale value Backward (criterion: Probability of F-to-
remove >= .100).
3 . Horsepower Backward (criterion: Probability of F-to-
remove >= .100).
4 . Width Backward (criterion: Probability of F-to-
remove >= .100).
5 . Fuel efficiency Backward (criterion: Probability of F-to-
remove >= .100).
6 . Curb weight Backward (criterion: Probability of F-to-
remove >= .100).
7 . Length Backward (criterion: Probability of F-to-
remove >= .100).
8 . Engine size Backward (criterion: Probability of F-to-
remove >= .100).
9 . Fuel capacity Backward (criterion: Probability of F-to-
remove >= .100).
a. All requested variables entered.
b. Dependent Variable: LogSale
7/31/2019 Regression Equation
53/56
Model Summary
Model R R Square
Adjusted R
Square
Std. Error of the
Estimate
1 .634a
.402 .345 1.08253
2 .633b
.401 .351 1.07796
3 .632c
.400 .355 1.07419
4 .631d
.398 .360 1.07068
5 .629e
.396 .363 1.06766
6 .628f
.394 .367 1.06480
7 .621g
.386 .364 1.06719
8 .618h
.382 .365 1.06590
9 .613i
.376 .365 1.06600
a. Predictors: (Constant), Fuel efficiency, 4-year resale value, Length,
Width, Engine size, Fuel capacity, Wheelbase, Curb weight,
Horsepower, Price in thousands
b. Predictors: (Constant), Fuel efficiency, Length, Width, Engine size,
Fuel capacity, Wheelbase, Curb weight, Horsepower, Price in
thousands
c. Predictors: (Constant), Fuel efficiency, Length, Width, Engine size,
Fuel capacity, Wheelbase, Curb weight, Price in thousands
d. Predictors: (Constant), Fuel efficiency, Length, Engine size, Fuel
capacity, Wheelbase, Curb weight, Price in thousands
e. Predictors: (Constant), Length, Engine size, Fuel capacity,
Wheelbase, Curb weight, Price in thousands
f. Predictors: (Constant), Length, Engine size, Fuel capacity,
Wheelbase, Price in thousands
g. Predictors: (Constant), Engine size, Fuel capacity, Wheelbase, Price
in thousands
h. Predictors: (Constant), Fuel capacity, Wheelbase, Price in thousands
i. Predictors: (Constant), Wheelbase, Price in thousands
7/31/2019 Regression Equation
54/56
Coefficientsa
Model
Unstandardized Coefficients
Standardized
Coefficients
t Sig.B Std. Error Beta
1 (Constant) -1.097 3.206 -.342 .733
4-year resale value -.012 .038 -.101 -.311 .757
Price in thousands -.036 .036 -.385 -1.015 .312
Engine size .310 .260 .244 1.190 .237
Horsepower -.003 .006 -.118 -.470 .639
Wheelbase .093 .030 .559 3.111 .002
Width -.026 .052 -.068 -.492 .624
Length -.018 .018 -.188 -1.008 .316
Curb weight .262 .495 .117 .530 .597
Fuel capacity -.059 .062 -.166 -.949 .345
Fuel efficiency .026 .049 .087 .538 .592
2 (Constant) -1.301 3.125 -.416 .678
Price in thousands -.046 .017 -.489 -2.793 .006
Engine size .323 .256 .255 1.264 .209
Horsepower -.003 .006 -.124 -.497 .620
Wheelbase .092 .030 .553 3.108 .002
Width -.027 .052 -.071 -.516 .607
Length -.017 .017 -.175 -.968 .335
Curb weight .317 .460 .141 .689 .493
Fuel capacity -.062 .060 -.176 -1.027 .307
Fuel efficiency .029 .048 .095 .599 .551
3 (Constant) -1.344 3.113 -.432 .667
Price in thousands -.053 .010 -.557 -5.065 .000
Engine size .238 .188 .187 1.262 .210
Wheelbase .094 .029 .564 3.210 .002
Width -.028 .052 -.073 -.537 .592
Length -.019 .017 -.199 -1.147 .254
7/31/2019 Regression Equation
55/56
Curb weight .377 .442 .168 .853 .395
Fuel capacity -.062 .060 -.175 -1.024 .308
Fuel efficiency .031 .048 .103 .653 .515
4 (Constant) -2.502 2.239 -1.117 .266
Price in thousands -.052 .010 -.547 -5.062 .000
Engine size .204 .177 .161 1.153 .251
Wheelbase .094 .029 .565 3.224 .002
Length -.022 .016 -.228 -1.380 .170
Curb weight .354 .439 .158 .806 .422
Fuel capacity -.068 .059 -.192 -1.150 .253
Fuel efficiency .029 .047 .096 .617 .539
5 (Constant) -1.553 1.622 -.958 .340
Price in thousands -.051 .010 -.539 -5.039 .000
Engine size .167 .166 .132 1.006 .316
Wheelbase .096 .029 .579 3.340 .001
Length -.021 .016 -.218 -1.331 .186
Curb weight .262 .411 .117 .637 .526
Fuel capacity -.083 .053 -.236 -1.556 .123
6 (Constant) -1.771 1.581 -1.120 .265
Price in thousands -.050 .010 -.524 -5.037 .000
Engine size .199 .158 .157 1.256 .212
Wheelbase .098 .029 .589 3.425 .001
Length -.019 .015 -.196 -1.226 .223
Fuel capacity -.063 .042 -.177 -1.473 .143
7 (Constant) -2.338 1.515 -1.542 .126
Price in thousands -.050 .010 -.525 -5.043 .000
Engine size .125 .147 .099 .852 .396
Wheelbase .070 .017 .422 4.011 .000
Fuel capacity -.050 .041 -.141 -1.205 .231
8 (Constant) -2.593 1.484 -1.747 .083
Price in thousands -.045 .008 -.474 -5.595 .000
Wheelbase .073 .017 .441 4.292 .000
7/31/2019 Regression Equation
56/56
Fuel capacity -.040 .040 -.113 -1.011 .314
9 (Constant) -1.920 1.326 -1.448 .150
Price in thousands -.049 .007 -.515 -6.945 .000
Wheelbase .061 .012 .369 4.980 .000
a. Dependent Variable: LogSale
As here, only the two variables that is price in thousand and wheelbase are the two significant variables while
the constant term is insignificant for the model.
Finally, the regression equation will be:
logSale = - 0.049(price in thousands) + 0.061(Wheelbase)
The result which we generate by enter method is mentioned below.
Logsale = - 0.064 (price in thousands) + 0.041(Wheelbase)
After the study of the two regression model we find that both the method gives us the same variable with
slightly different in the coefficients. So it is possible that may find a minor difference in the beta coefficients.
Tolerance and Variance Inflation factor (VIF)The tolerance and VIF are reciprocal to each other. It use for the measurement of independency
of the independent variables or one can say that to measure the multicollinearity among the
independent variables. multicollinearity measure expressing the degree of explanation or
variation of one independent variable in the model due to other independent variable in the
model. This can be calculated as to take one of the independent variable as a dependent
variable and rest of the independent variables as predictors and then to regress them. The
regression will tell us the value of R2, the percentage which is explained by the other variable.
This R2
is called as R2*
, the amount of independent variable which is explained by the other
independent variable.Tolerance is calculated as 1-R
2*, It means that as far as the value of tolerance becomes high, the
multicollinearity will be low.
Variance inflation factor (VIF) is calculated as 1/Tolerance. VIF is the degree to which thestandard error has been increased due to multicollinearity.
R2*
increases then tolerance will decrease this implies VIF will increase this means
multicollinearity is increases.