Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Regression analysis
Advanced Financial Accounting II
Åbo Akademi School of Business
Regression analysis
A statistical process for estimating the relationships
among variables
Includes many techniques for modeling and analyzing
several variables, when the focus is on the relationship
between a dependent variable and one or more
independent variables
Helps one understand how the typical value of the
dependent variable (or 'Criterion Variable') changes
when any one of the independent variables is varied,
while the other independent variables are held fixed
Regression models and regression
function
Regression models involve the following variables:
– The unknown parameters, b, which may represent a
scalar or a vector.
– The independent variables, X.
– The dependent variable, Y.
A regression model relates Y to a function of X and b
Y = f(X,b)
Regression model and regression
function...
Regression analysis estimates the conditional
expectation of the dependent variable given the
independent variables
E(Y | X) = f(X,b)
The estimation target is the regression function Y =
f(X,b)
it is also of interest to characterize the variation of the
dependent variable around the regression function,
which can be described by a probability distribution
Linear regression
In linear regression, the model specification is that the
dependent variable, is a linear combination of the
parameters b
– need not be linear in the independent variables X
For example, in simple linear regression for modeling n
data points there is one independent variable X, and
two parameters, b0 and b1 giving the straight line
yi = b0 + b1xi + ei
ei is an error term and the subscript i indexes a
particular observation
Simple linear regression
Example of simple linear regression, which has one
independent variable
Diagnostics
Once a regression model has been constructed, it
may be important to confirm the goodness of fit of the
model and the statistical significance of the estimated
parameters
Commonly used checks of goodness of fit include
– the coefficient of determination R2
– analyses of the pattern of residuals
– hypothesis testing
Statistical significance can be checked by
– F-test of the overall fit
– t-tests of individual parameters
Goodness of fit – Coefficient of
determination R2
The coefficient of determination, R2 indicates how well
data points fit a line or curve
Provides a measure of how well observed outcomes
are replicated by the model, as the proportion of total
variation of outcomes explained by the model
The better the linear regression fits the data, the
closer the value of R2 is to one
squares of sum total the ,SS
squares of sum residual the ,SS
where,1
2
res
2
res
2
i
i
i
ii
tot
res
yy
fy
SS
SSR
Goodness of fit – Adjusted R2
R2 automatically increases when extra explanatory
variables are added to the model
Some of the increase may be due to spurious effects
A modification of R2 adjusts for the number of
explanatory terms in a model relative to the number
of data points
Unlike R2, the adjusted R2 increases when a new
explanator is included only if the new explanator
improves the R2 more than would be expected in the
absence of any explanatory value being added by the
new explanator
Simple linear regression analysis – an
example
Research question: Does the amount of money spent
on advertising in affect the yearly sales of a company?
Data: File: AFAII_Regression_Excercise.xlsx
– Yearly sales (Sales)
– Amount spent on advertising (AdvTotal)
for 100 companies
Regression equation to estimate:
Salesi = b0 + b1AdvTotali + ei
Simple regression analysis with SPSS
Analyze
Regression
Linear
Move Sales to Dependent
Move AdvTotal to Independent(s)
OK
Simple Linear Regression Analysis
with SPSS – Interpretation – Model fit
Adjusted R2 = 0.375
37.5 % of the variation in
the yearly sales is explained
by the amount spent on
advertising – all other
factors fixed
Simple Linear Regression Analysis with
SPSS – Significance of total model
The F-statistics for
the total model
significant at 5 % level
Simple Linear Regression Analysis with
SPSS – Interpretation – Coefficients
t-values for both Constant and the
independent variable AdvTotal >
1.96 the parameter estimates are
significant at 5 % level
Estimated regression equation
Salesi = 11 890,599 + 4.914 AdvTotali + ei
Multiple linear regression analysis
In the more general multiple regression model, there are p independent variables:
yi = b0 + b1xi1 + b2xi2 + … + bpxip + ei
The predictor variables have to be linearly independent, i.e. it is not possible to express any predictor as a linear combination of the others
Highly correlated predictor variables lead to multicollinearity problems where the coefficient estimates may change erratically in response to small changes in the model or the data
– Multicollinearity does not reduce the predictive power
or reliability of the model as a whole but it may not give valid results about any individual predictor
Multiple linear regression analysis –
an example
Research question: Do the amounts of money spent on advertising in TV, web, and press affect the yearly sales of a company?
Data: File: AFAII_Regression_Excercise.xlsx
– Yearly sales (Sales)
– Amount spent on advertising in TV (AdvTV)
– Amount spent on advertising in web (AdvWeb)
– Amount spent on advertising in press (AdvPress)
for 100 companies
Regression equation to estimate:
Salesi = b0 + b1AdvTVi + b2AdvWebi + b3AdvPressi + ei
Multiple linear regression analysis
with SPSS
Analyze
Regression
Linear
Move Sales to Dependent
Move AdvTV, AdvWeb, and AdvPress to
Independent(s)
Method: Enter
OK
MLR with SPSS – Interpretation
Coefficients for all three
independent variables
are estimated
MLR with SPSS – Interpretation –
Goodness of fit
Adjusted R2 = 0.398
39.8 % of the variation in
the yearly sales is explained
by the amount spent on
advertising in TV, web and
press
MLR with SPSS – Interpretation –
Significance of total model
The F-statistics for
the total model
significant at 5 % level
MLR with SPSS – Interpretation –
Coefficients
Coefficients for AdvTV and
AdvWeb significant at 5 % level
(t-value > 1.96, significance >
0.05) Constant and coefficient
for AdvPress insignificant
Stepwise regression models
The method Enter estimates a model simultaneously
including all the suggested variables that pass some
predefined criteria
The insignificance of one of the suggested predictor
variables, AdvPress, suggests that a more suitable
model could be found by eliminating this variable
In order to find a suitable variable combination, a
stepwise estimation process may be selected
In SPSS: Method: Stepwise
Stepwise MLR with SPSS
The variables AdvTV and
AdvWeb were entered in the
regression model in the
order they improve the
total model significance (F-
statistics). AdvPress was left
outside the model.
Stepwise MLR with SPSS –
Development of Goodness of fit
Entering the second
independent variable
AdvWeb increases the
explanation power of the
model from 34.9 % to
39.4 %
Stepwise MLR with SPSS –
Coefficients
t-values for both Constant and the
independent variables AdvTV and
AdvWeb > 1.96 the parameter
estimates are significant at 5 % level Estimated regression equation
Salesi = 8 450.755 + 4.549 AdvTVi + 21.532 AdvWebi + ei