Upload
vikrant-khadilkar
View
221
Download
0
Embed Size (px)
Citation preview
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
1/48
Regression Analysis
kkxbxbxbxbay ... 332211
X1
X2
X3
y
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
2/48
COMMON TYPES OF ANALYSIS?
1. Compare Groupsa. Compare Proportions (e.g., Chi Square Test2)
H0: P1 = P2 = P3= = Pk
b. Compare Means (e.g., Analysis of Variance) H0: 1 = 2 = 3= = k
2. Examine Strength and Direction of Relationships
a. Bivariate (e.g., Pearson Correlationr) Between one variable and another: Y = a + b1 x1
b. Multivariate (e.g., Multiple Regression Analysis) Between one dep. var. and each of several indep. variables,
while holding all other indep. variables constant:
Y = a + b1 x1 + b2 x2 + b3 x3+ + bk xk
STATITICAL DATA ANALYSIS
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
3/48
Simple and Multiple Regression Analysis
Examines whether changes/differences in values of one variable(dependent variable Y) are linked to changes/differences in valuesof one or more other variables (independent variables X1, X2,etc.), while controlling for the changes in values of all other Xs.
E.g., Relationship between salary and gender for people who have thesame levels of education, work experience, position level, seniority, etc.
The DV (Y) must be metric.
The IVs (Xs) must be eithermetric ordummy var.
Central Question Addressed:
Is Y a function of X1, X2, etc.? How ?Is there a relationship between Y and X1, X2 , etc., (in eachcase, after controlling for the effects of all other Xs)? In whatway?
What is the relative impact of each X on Y, holding all other
Xs constant (that is, all other Xs being equal)?
What does regression analysis do?
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
4/48
Simple and Multiple Regression AnalysisMore specifically,
Do values of Y tend to increase/decrease asvalues of X1, X2, etc. increase/decrease?
If so,
By how much?And
How strong is the connection/relationshipbetween Xs and Y?
what % of differences/variations
in Y values (e.g., income) amongstudy subjects can be explained by(or attributed to) differences inX values (e.g. years of education,years of experience, etc.)?
X1
X2
X3
y
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
5/48
Simple and Multiple Regression AnalysisNOTE: Once we can determine how values of Y change as afunction of values of X
1
, X2
, etc., we will also be able topredict/estimate the value of Y from specific values of X1,X2, etc.
Y = a + b1 x1 + b2 x2 + b3 x3+ + bk xk+
Therefore, regression analysis, in a sense, is aboutESTIMATING values of Y, using information aboutvalues of Xs:
Estimation, by definition, involves?
The objective?
To minimize error in estimation.
Or, to compute estimates that are
as close to the true/actual values as possible.
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
6/48
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
7/48
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
8/48
Simple and Multiple Regression Analysis
Estimating Number of Credit Cards
i
Family
Number
Actual # of
Credit Cards
Estimatefor #
of Credit Cards
Errorin
Estimation
1 4 7 ?
2 6 7 ?
3 6 7 ?
4 7 7 ?
5 8 7 ?
6 7 7 ?7 8 7 ?
8 10 7 ?
56 iy 7856
yy
yy iy
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
9/48
Simple and Multiple Regression Analysis
Estimating Number of Credit Cards
i
Family
Number
Actual # of
Credit Cards
Estimatefor #
of Credit Cards
Errorin
Estimation
1 4 7 -3
2 6 7 -1
3 6 7 -1
4 7 7 0
5 8 7 +1
6 7 7 07 8 7 +1
8 10 7 +3
56 iy
yyi
78
56 yy
yy iy
Lets now see all
this graphically
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
10/48
Simple and Multiple Regression Analysis
10
9
8
7
6
5
4
3
2
1
0
F1
F2, F3F4
F5
F6F7
F8
YY^
Estimate
Lets spread the dots away from each
other to see things more clearly!
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
11/48
Simple and Multiple Regression Analysis
10
9
8
7
6
5
4
3
2
1
0
F1
F2
F3F4
F5
F6
F7
F8
Estimation Error
YY^
Graphic Representation
Estimate
Actual
Can we determine the
total estimation error
for all 8 families?
Estimate
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
12/48
Simple and Multiple Regression Analysis
i
Family
Number
Actual # of
Credit Cards
Estimatefor #
of Credit Cards
Errorin
Estimation
1 4 7 -3
2 6 7 -1
3 6 7 -14 7 7 0
5 8 7 +1
6 7 7 0
7 8 7 +1
8 10 7 +3
56 iy )( yyi78
56 yy
What would be the
total estimation
error for all 8
families combined?
= 0
Solution?
yy iy yyi
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
13/48
Simple and Multiple Regression Analysis
Estimating Number of Credit Cards
i
Family
Number
Actual# of
Credit Cards
Estimatefor #
of Credit Cards
Errorin
Estimation
Errors Squared
1 4 7 -3 9
2 6 7 -1 1
3 6 7 -1 1
4 7 7 0 0
5 8 7 +1 1
6 7 7 0 07 8 7 +1 1
8 10 7 +3 9
2
)( yyi
222)( yyi
SST= Sum of Squares Total
iy yy yyi
56 iy 78
56 yy 0)( yyi
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
14/48
Simple and Multiple Regression Analysis
22 = SST = Index for total (combined) amount of estimation errorfor all families (observations) in the sample when using the meanas the estimate.
SST is also the sum of squared deviations from the mean.
o Remember the formula for computing Variance?
Objective in Estimation?
Minimize error, maximize precision.
Can we cut down the amount of estimation error (SST)? How?
Yes, we can,by using information about other variables suspectedto be strong predictors (strongly related to) # of credit cards
possessed by families (e.g., family size, family income, etc.)..
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
15/48
Simple and Multiple Regression Analysis
i
Family
Number
Actual # of
Credit Cards
Family Size
1 4 2
2 6 2
3 6 4
4 7 4
5 8 5
6 7 5
7 8 6
8 10 6
y x
We now can attempt to
estimate # of credit cards
from the information onfamily size, rather than
from its own mean.
Lets first see this graphically!
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
16/48
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
17/48
Simple and Multiple Regression Analysis
10
9
8
7
6
5
4
3
2
1
0
F1
F2
F3
F4 F5
F6
F7
F8Y
1 2 3 4 5 6 7
Original (Baseline)
Estimate
X
Family Size
Generic Equation for any
straight line: Y= a + bx
xbay 11
xbay 22
xbay 33
Regression Line
yy
yxay 0
Regression Line
(Line of Best Fit)--
new improved
location for CCestimates (see next
slide)
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
18/48
Simple and Multiple Regression Analysis
10
9
8
7
6
5
4
3
2
1
0
F1
F2
F3
F4F5
F6
F7
F8Y
1 2 3 4 5 6 7
Original(Baseline)Estimate
X
Family Size
Reg. Line (Line ofBest Fit)--new
improved location
for CC estimates
y
Estimation ERROR )( yy
Regression Line will
Minimize = total estimation error.2
)( yy
bxay
But, how do we know the values aand b in (the reg. line)?bxay
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
19/48
2)(
))((
xx
yyxx
b
xbya
Actual # of credit cards
bxay
Lets use above formulas to compute the values ofa
andb for the regression line in our example.
We will need: and
EQUATION FOR REGRESSION LINE (LINE OF BEST FIT)--
Values ofa and b for the regression line:
,y,x ),)(( yyxx
2
)( xx
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
20/48
Simple and Multiple Regression Analysis
i
Family
Number
Actual#
of Credit
Cards
Family
Size
1 4 2 ? ? ? ?2 6 2 ? ? ? ?
3 6 4 ? ? ? ?
4 7 4 ? ? ? ?
5 8 5 ? ? ? ?
6 7 5 ? ? ? ?
7 8 6 ? ? ? ?
8 10 6 ? ? ? ?
78
56Y 25.4
8
34x
xx
?))(( yyxx
yy ))(( yyxx
?2)( xx
2
)( xx
We need: ),)(( yyxx 2
)( xxand,y ,x
y x
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
21/48
Simple and Multiple Regression Analysis
i
Family
Number
Actual #
of Credit
Cards
Family
Size
1 4 2 -2.25 -3 6.75 5.0625
2 6 2 -2.25 -1 2.25 5.0625
3 6 4 -.25 -1 .25 .0625
4 7 4 -.25 0 0 .0625
5 8 5 .75 1 .75 .5625
6 7 5 .75 0 0 .5625
7 8 6 1.75 1 1.75 3.0625
8 10 6 1.75 3 5.25 3.0625
7
8
56Y 25.4
8
34x
xx
17))(( yyxx
yy ))(( yyxx
5.17
2
)( xx
2
)( xx
We need: ),)(( yyxx 2
)( xxand,y ,x
y x
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
22/48
REGRESSION LINE (LINE OF BEST FIT):
a =2.87 b = .97
971.5.17
17
2)(
))((
xx
yyxxb
87.2)25.4(971.7 xbya
xy 97.87.2
bxay
?
Y-Intercept
?
Regression Coefficient
Simple and Multiple Regression Analysis
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
23/48
Simple and Multiple Regression Analysis
10
9
8
7
6
5
4
3
2
1
0
F1
F2
F3
F4
F5
F6
F7
F8Y
1 2 3 4 5 6 7
Original(Baseline)
Estimate
X
Family Size
xy 97.87.2
y
Can we tell how much estimation error we have
committed by using the new regression line?
NewImprovedEstimates
Yes, examine differencesbetween our households
actual # of CCs and their new/regression estimates.
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
24/48
Simple and Multiple Regression Analysis
iFamily
Numbe
r
Actual #
of Credit
Cards
Family
Size
Regression
Estimate
Error
(Residual)
Errors
Squared
1 4 2 ? ? ?
2 6 2 ? ? ?
3 6 4 ? ? ?
4 7 4 ? ? ?
5 8 5 ? ? ?
6 7 5 ? ? ?7 8 6 ? ? ?
8 10 6 ? ? ?
yy y2
)( yy
xy 97.87.2
2)( yy
xy
y
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
25/48
Simple and Multiple Regression Analysis
iFamily
Numbe
r
Actual #
of Credit
Cards
Family
Size
Regression
Estimate
Error
(Residual)
Errors
Squared
1 4 2 4.81 -.81 .66
2 6 2 4.81 1.19 1.42
3 6 4 6.76 -.76 .58
4 7 4 6.76 .24 .06
5 8 5 7.73 .27 .07
6 7 5 7.73 -.73 .537 8 6 8.7 -.7 .49
8 10 6 8.7 1.3 1.69
yy y2
)( yy
xy 97.87.2 81.4)2(97.87.2 y
2)( yy5.486
SSE= Sum of Squares Error (SS Residual)
xy
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
26/48
Simple and Multiple Regression Analysis
Total Baseline Error using the mean (SS Total) 22.0
New or Remaining Error (SS Error orSS Residual) 5.486 ~ 5.5
QUESTION: How much of the original estimation error have we explainedaway (eliminated) by using the regression model (instead of the mean)?
16.514 / 22 = .751 or 75% What is this called?
% of differences in # of CCs among households that isexplained by differences in their family size.
What does the remaining 25% represent?
225.486 = 16.514 (SS Regression or SS Explained)
QUESTION: What % of estimation error have we explained (eliminated byusing the regression model?
Percent of variation (differences) in number of credit cards owned by families
that can be accounted for by: (a) all other potential predictors not included in the
model, beyond family size, and (b) unexplainable random/chance variations.
X1
Total Var.
in Y = 22
16.5
5.5
Y
R2 =
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
27/48
Simple and Multiple Regression Analysis
R2 is a measure of our success regarding accuracy of our estimation effort.
R2 = % of estimation error that we have been able to explain away byusing the regression model, instead of using the mean.
R2 indicates how much better we can predict Y from information about
Xs, rather than from using its own mean. R2= % of differences (variations) in Y values that is explained by
(attributable to) differences in X values.
Note: When dealing with only two variables (a single X and Y):
Lets now examine all this graphically!
866.75.22
514.162 Rr
R2 = SS Regression / SS Total = 16.5/22 = 75%
Pearson Correlationof Y with X1(NOT controlling for anyother var.)
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
28/48
yy ?
Simple and Multiple Regression Analysis
10
9
8
7
6
5
4
3
2
1
0
F1
F2
F3
F4F5
F6
F7
F8Y
1 2 3 4 5 6 7
Original(Baseline)
Estimate
X
Family Size
xy 97.87.2
Original
Baseline
ERROR
for F1
yy y
yy
New ERROR
(Unexplained/
RESIDUAL)
Explained by
REGRESSION
Model
Regression Line (New Improved Estimates):
?
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
29/48
Simple and Multiple Regression Analysis
5.5 = SSE =The amount of estimation error for the 8 sample familieswhen using simple regression (i.e., a regression model that includesonly information about family size).
Can we reduce the amount of estimationerror (SSE) to an even lower level and,thus, improving the estimation process? How?
Yes, by adding information on a second variables suspected to bestrongly related to # of credit cards (e.g., family income--X2).
.
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
30/48
Simple and Multiple Regression Analysis
i
Family
Number
yi
Actual # of
Credit Cards
Family Size Family
Income
1 4 2 14
2 6 2 16
3 6 4 144 7 4 17
5 8 5 18
6 7 5 21
7 8 6 17
8 10 6 25
Generic Equation for a linear plane:2211
xbxbay
1x 2x
Lets examine the regression plane for our example graphically.
We now can attempt
to estimate # of CCs
from our information
on family size andfamily income!
Our regression model
will now be a linear
plane, rather than astraight line!
Y # f C dit C ds
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
31/48
21 216.63.482. xxy
Lets now see
how much error
in estimation we
are committing
by using this
multiple
regression
model.
Y = # of Credit Cards
X1 = Family Size
Family Income
12
11
109
8
7
65
4
3
21
0
Formulas are available forcomputing values ofa, b1 and b2
MULTIPLE REGRESSIONMODEL FOR OUR EXAMPLE:
Actual
Regression Estimate
2211 xbxbay
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
32/48
Simple and Multiple Regression Analysis
iFamily
Number
Actual #
of Credit
Cards
Family
Size
Family
Income
($000)
Regression
Estimate
Error
(Residual)
Errors
Squared
1 4 2 14 ? ? ?
2 6 2 16 ? ? ?
3 6 4 14 ? ? ?
4 7 4 17 ? ? ?
5 8 5 18 ? ? ?
6 7 5 21 ? ? ?7 8 6 17 ? ? ?
8 10 6 25 ? ? ?
yy
Y
2
)( yy
2)( yy
21 216.63.482. xxy y
y 1x 2x
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
33/48
Simple and Multiple Regression Analysis
iFamily
Number
Actual#
of Credit
Cards
Family
Size
Family
Income
($000)
Regression
Estimate
Error
(Residual)
Errors
Squared
1 4 2 14 4.77 -.77 .59
2 6 2 16 5.20 .80 .64
3 6 4 14 6.03 -.03 .00
4 7 4 17 6.68 .32 .10
5 8 5 18 7.53 .47 .22
6 7 5 21 8.18 -1.18 1.397 8 6 17 7.95 .05 .00
8 10 6 25 9.67 .33 .11
yy
Y
2
)( yy
2
)( yy3.05SSE= Sum of Squares Error (Residual)
21 216.63.482. xxy 77.4)14(216.)2(63.482. y
y 1x 2x
Unique (additional) contribution of X2 (family income) beyond X1= ? 5.5 3.05 = 2.45
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
34/48
?Y-Intercept, a
(NOTE: Only when all Xs
can meaningfully take onvalue of zero, the intercept
will have a meaningful/direct/
practical interpretation.
Otherwise, it is simply an aid
in increasing accuracy of
estimation.
?b1andb2= Regression Coefficients
0.63: Among families of the same income, an increase in
family size by one person would, on average, result in .63more credit cards.
0.21: Among families of the same size, an income increase
of $1,000, results in an average increase of 0.2 credit cards .
bs represent effect of each X on Y when all other Xs are
controlled for/held constant/taken into account
i.e., after impacts of all other variables are accounted
for (remember the high blood pressure-hearing
problem connection?)
Simple and Multiple Regression Analysis
21 216.63.482. xxy
The MULTIPLE REGRESSION MODEL FOR OUR EXAMPLE:
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
35/48
SST = 22 SSE = 3.05
What is our new R2?
Simple and Multiple Regression Analysis
21 216.63.482. xxy
SS Regression = 223.05 = 18.95
R2
= 18.95 / 22 = .861 or 86%
The MULTIPLE REGRESSION MODEL FOR OUR EXAMPLE:
Percent of differences in households
number of CCs that is explained by
differences in family size and family
income.
The Remaining 14%?
(3.05 / 22 = .14)
Percent of variation in number of credit
cards that can be accounted for by (a) all
other relevant factors not included in the
model, beyond family size and income, and
(b) unexplainable random/chance variations.
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
36/48
dcba
caryx
1
dcba
cbryx
2
X1=FamilySize
X2= Family
Income
da
bc
Y= # of CC
Pearson/simpleCorrelationof Y with X1(not controllingfor X2)
Pearson/simpleCorrelation of Ywith X2 (notcontrolling for
X1) ?
Total Variation/Error in Y = SS Total = a + b + c + d = 22
829.022
11.15
2
yxr
867.075.022
5.161 yxr
2398.063.0 Xy
197.87.2 Xy r2 = ? R2 = (a+c) / (a+b+c+d)
r2 = (b+c) / (a+b+c+d) = 15.12 / 22 = 0.687
X1=Familysize
Y
SSR=
a + c= 16.5
YSSR =
c + b= 15.12
What do we call the square root of this?
X2= FamilyIncome
R2 = 16.5 / 22 = 0.75
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
37/48
SSR = a + b +c = 18.95
SST = a + b + c + d = 22
R2 = SSR / SST = (a + b + c) / (a + b + c + d) = 18.95 / 22 = 86%
SSE = ?
SSE= d = 2218.95 = 3.05
21 216.63.482. xxy
X1=FamilySize
X2= FamilyIncome
da
bc
NOTE: c is explained by
both X1 and X2
R2 Graphically= ?
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
38/48
Simple and Multiple Regression Analysis
iFamily
Number
Actual#
of Credit
Cards
Family
Size
Family
Income
($000)
Regression
Estimate
Error
(Residual)
Errors
Squared
1 4 2 14 4.77 -.77 .59
2 6 2 16 5.20 .80 .64
3 6 4 14 6.03 -.03 .00
4 7 4 17 6.68 .32 .10
5 8 5 18 7.53 .47 .22
6 7 5 21 8.18 -1.18 1.397 8 6 17 7.95 .05 .00
8 10 6 25 9.67 .33 .11
yy
Y
2
)( yy
2
)( yy3.05SSE= Sum of Squares Error (Residual)
21 216.63.482. xxy 77.4)14(216.)2(63.482. y
y 1x 2x
Unique (additional) contribution of X2 = 5.5 3.05 = 2.45Remember:
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
39/48
Exercise 1: Redo the credit card analysis withSPSS.
First, Correlations and Simple Regression
Next, Multiple Regression (also ask for part and
partial correlations.)
SPSS CREDIT CARD FILE
http://localhost/var/www/apps/conversion/tmp/scratch_5//datastore01/website/mqm497/MQM%20497%20SPSS%20Data%20Files/MQM497%20Credit%20Card%20Regression%20Model.savhttp://localhost/var/www/apps/conversion/tmp/scratch_5//datastore01/website/mqm497/MQM%20497%20SPSS%20Data%20Files/MQM497%20Credit%20Card%20Regression%20Model.sav7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
40/48
Simple and Multiple Regression Analysis
EXERCISE 2: Usinggss_2 data file, we are interested inunderstanding the role that the following demographics (age, educ, sibs,
agewed), as well as respondent income (rincmdol), job satisfaction (satjob_2),and marriage satisfaction (hapmar_2) play in determining/predicting onesgeneral happiness (happy_2).
We also wish to know which of the above variables is the strongest predictor of
general happiness (Standardized Reg. Coefficients).
Use the gss_2 data file and conduct the appropriate analysis.NOTE:
satjob_2 is coded as: hapmar_2 is coded as:
1 = Very Dissatisfied 1 = Not Too Happy
2 = A Little Dissatisfied 2 = Pretty Happy
3 = Pretty Satisfied 3 = Very Happy
4 = Very Satisfied
http://localhost/var/www/apps/conversion/tmp/scratch_5//datastore01/website/mqm497/MQM%20497%20SPSS%20Data%20Files/gss_2%20(gss%20with%20happy,%20satjob,%20%20and%20hapmar%20reverse%20coded%20for%20497).savhttp://localhost/var/www/apps/conversion/tmp/scratch_5//datastore01/website/mqm497/MQM%20497%20SPSS%20Data%20Files/gss_2%20(gss%20with%20happy,%20satjob,%20%20and%20hapmar%20reverse%20coded%20for%20497).sav7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
41/48
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
42/48
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
43/48
Simple and Multiple Regression Analysis
EXAMPLE 1: Income = 24000 + 1400 gender.
Coded: Female = 0, Male = 1
Income = 12000 + 1000 Education Years + 800 Gender
Coded: Female = 0, Male = 1
Meaning?Meaning?
Average income of females
with no education is $12000.
Among people of the same gender, everyadditional year of education results in an
average additional income of $1,000.
Males make, on average, $800 more in
comparison with females who have the
same number of years of education.
Average income of females is $24,000.
Males on average make $1400 more than females
MULTIPLE REGRESSION EXAMPLE 2:
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
44/48
Exercise 4: Suppose we are interested inknowing what role, if any, demographiccharacteristics (i.e., age, sex_Dummy, educ,sibs, agewed, incomdol), as well as job
satisfaction (satjob-2), and marriagesatisfaction (hapmar-2) play in determiningones overall happiness in life (happy-2).
Use the gss_2 data file and conduct theappropriate analysis.
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
45/48
Exercise 3: Suppose we are interested inknowing what role, if any, the followingdemographic characteristics play indetermining ones income (rincmdol):
Age,Sex_Dummy(0=male, 1=female),
age first married (agewed),
Years of education completed (educ), and
Political party affiliation--republic(0=Democrat,1=Republican) .
Use the gss_2 data file and conduct theappropriate analysis.
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
46/48
Assignment 5Data file Salary.sav contains information about 474 employees hired by a Midwestern bank
between 1969 and 1971 (NOTE: Due to SPSS site license restrictions, this hyperlink will
not work if you are off campus). Of the 474 employees, 258 were men, 216 women, 370white, and 104 non-white. The bank was subsequently involved in EEOC litigation; the
bank was accused of gender and race discrimination in its hiring and compensation
practices. The two issues that were of particular interest in the litigation were alleged
gender and racial inequalities not only in the banks beginning salaries (variable salbeg),
but also in its later salaries (variable salnow).
1. Print, examine, and interpret correlation coefficients between beginning salary(salbeg) and age in years (age), education in years (edlevel), employment category or job
classification level--rated from 1=lowest to 8=highest (jobcat), and work experience in
months (work).
2. Conduct the appropriate analysis to see: (a) What role each of the variables age,
education (edlevel), employment category (jobcat), and work experience (work) played,
holding all other variables constant, in determining the banks beginning salaries? Forexample, what was the differential pay for one additional year of education among new
hires who otherwise had the same age, employment category, and work experience? (b)
Which of the above demographic characteristics had the strongest influence on beginning
pay? How can you tell? (c) What percent of the differences in employees beginning
salaries can be explained by/attributed to difference in all of the above characteristics?
http://www.cob.ilstu.edu/udrive/MQM/MQM%20497/Hemmasi/MQM497_Data_Files/SALARY.savhttp://www.cob.ilstu.edu/udrive/MQM/MQM%20497/Hemmasi/MQM497_Data_Files/SALARY.sav7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
47/48
Assignment 53. Now conduct the appropriate analysis to indicate, holding all other variables
constant, what roles gender(sex, male=0, female=1) played in determining beginning
salaries at the bank. That is, what was the differential beginning pay between male andfemale employees who otherwise had the same age, education, employment category, and
work experience? Does this evidence support the charges of gender discrimination in the
banks practices regarding initial compensation?
4. During litigation, it was charged that the banks unfair compensation practices had
continued beyond its initial salary decisions. That is, the prosecution claimed that with
time, not only the beginning salary disparities between men and women did not shrink, butfurther widened. Conduct the appropriate analysis to indicate (a) everything else being
equal, what roles gender played in determining employees later salaries at the bank
(salnow). That is, what was the average differential pay between male and female
employees who otherwise had the same age, education, employment category, work
experience, andjob seniority (variable timerepresents seniority in terms of number of
months employed at the bank)? (b) Compare the later pay disparities you have justidentified with the beginning pay disparities you had found in question 3 above to explain
if the evidence supports the prosecutions charges of continued gender discrimination
beyond initial salary decisions, resulting in widening disparities in later pay.
NOTE: For each question, provide thorough explanations on corresponding pages and
parts of your printout.
7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280
48/48
Simple and Multiple Regression Analysis
QUESTIONS OR
COMMENTS?