12.Simple and Multiple Regression Analysis-LDR 280

Embed Size (px)

Citation preview

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    1/48

    Regression Analysis

    kkxbxbxbxbay ... 332211

    X1

    X2

    X3

    y

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    2/48

    COMMON TYPES OF ANALYSIS?

    1. Compare Groupsa. Compare Proportions (e.g., Chi Square Test2)

    H0: P1 = P2 = P3= = Pk

    b. Compare Means (e.g., Analysis of Variance) H0: 1 = 2 = 3= = k

    2. Examine Strength and Direction of Relationships

    a. Bivariate (e.g., Pearson Correlationr) Between one variable and another: Y = a + b1 x1

    b. Multivariate (e.g., Multiple Regression Analysis) Between one dep. var. and each of several indep. variables,

    while holding all other indep. variables constant:

    Y = a + b1 x1 + b2 x2 + b3 x3+ + bk xk

    STATITICAL DATA ANALYSIS

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    3/48

    Simple and Multiple Regression Analysis

    Examines whether changes/differences in values of one variable(dependent variable Y) are linked to changes/differences in valuesof one or more other variables (independent variables X1, X2,etc.), while controlling for the changes in values of all other Xs.

    E.g., Relationship between salary and gender for people who have thesame levels of education, work experience, position level, seniority, etc.

    The DV (Y) must be metric.

    The IVs (Xs) must be eithermetric ordummy var.

    Central Question Addressed:

    Is Y a function of X1, X2, etc.? How ?Is there a relationship between Y and X1, X2 , etc., (in eachcase, after controlling for the effects of all other Xs)? In whatway?

    What is the relative impact of each X on Y, holding all other

    Xs constant (that is, all other Xs being equal)?

    What does regression analysis do?

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    4/48

    Simple and Multiple Regression AnalysisMore specifically,

    Do values of Y tend to increase/decrease asvalues of X1, X2, etc. increase/decrease?

    If so,

    By how much?And

    How strong is the connection/relationshipbetween Xs and Y?

    what % of differences/variations

    in Y values (e.g., income) amongstudy subjects can be explained by(or attributed to) differences inX values (e.g. years of education,years of experience, etc.)?

    X1

    X2

    X3

    y

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    5/48

    Simple and Multiple Regression AnalysisNOTE: Once we can determine how values of Y change as afunction of values of X

    1

    , X2

    , etc., we will also be able topredict/estimate the value of Y from specific values of X1,X2, etc.

    Y = a + b1 x1 + b2 x2 + b3 x3+ + bk xk+

    Therefore, regression analysis, in a sense, is aboutESTIMATING values of Y, using information aboutvalues of Xs:

    Estimation, by definition, involves?

    The objective?

    To minimize error in estimation.

    Or, to compute estimates that are

    as close to the true/actual values as possible.

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    6/48

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    7/48

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    8/48

    Simple and Multiple Regression Analysis

    Estimating Number of Credit Cards

    i

    Family

    Number

    Actual # of

    Credit Cards

    Estimatefor #

    of Credit Cards

    Errorin

    Estimation

    1 4 7 ?

    2 6 7 ?

    3 6 7 ?

    4 7 7 ?

    5 8 7 ?

    6 7 7 ?7 8 7 ?

    8 10 7 ?

    56 iy 7856

    yy

    yy iy

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    9/48

    Simple and Multiple Regression Analysis

    Estimating Number of Credit Cards

    i

    Family

    Number

    Actual # of

    Credit Cards

    Estimatefor #

    of Credit Cards

    Errorin

    Estimation

    1 4 7 -3

    2 6 7 -1

    3 6 7 -1

    4 7 7 0

    5 8 7 +1

    6 7 7 07 8 7 +1

    8 10 7 +3

    56 iy

    yyi

    78

    56 yy

    yy iy

    Lets now see all

    this graphically

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    10/48

    Simple and Multiple Regression Analysis

    10

    9

    8

    7

    6

    5

    4

    3

    2

    1

    0

    F1

    F2, F3F4

    F5

    F6F7

    F8

    YY^

    Estimate

    Lets spread the dots away from each

    other to see things more clearly!

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    11/48

    Simple and Multiple Regression Analysis

    10

    9

    8

    7

    6

    5

    4

    3

    2

    1

    0

    F1

    F2

    F3F4

    F5

    F6

    F7

    F8

    Estimation Error

    YY^

    Graphic Representation

    Estimate

    Actual

    Can we determine the

    total estimation error

    for all 8 families?

    Estimate

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    12/48

    Simple and Multiple Regression Analysis

    i

    Family

    Number

    Actual # of

    Credit Cards

    Estimatefor #

    of Credit Cards

    Errorin

    Estimation

    1 4 7 -3

    2 6 7 -1

    3 6 7 -14 7 7 0

    5 8 7 +1

    6 7 7 0

    7 8 7 +1

    8 10 7 +3

    56 iy )( yyi78

    56 yy

    What would be the

    total estimation

    error for all 8

    families combined?

    = 0

    Solution?

    yy iy yyi

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    13/48

    Simple and Multiple Regression Analysis

    Estimating Number of Credit Cards

    i

    Family

    Number

    Actual# of

    Credit Cards

    Estimatefor #

    of Credit Cards

    Errorin

    Estimation

    Errors Squared

    1 4 7 -3 9

    2 6 7 -1 1

    3 6 7 -1 1

    4 7 7 0 0

    5 8 7 +1 1

    6 7 7 0 07 8 7 +1 1

    8 10 7 +3 9

    2

    )( yyi

    222)( yyi

    SST= Sum of Squares Total

    iy yy yyi

    56 iy 78

    56 yy 0)( yyi

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    14/48

    Simple and Multiple Regression Analysis

    22 = SST = Index for total (combined) amount of estimation errorfor all families (observations) in the sample when using the meanas the estimate.

    SST is also the sum of squared deviations from the mean.

    o Remember the formula for computing Variance?

    Objective in Estimation?

    Minimize error, maximize precision.

    Can we cut down the amount of estimation error (SST)? How?

    Yes, we can,by using information about other variables suspectedto be strong predictors (strongly related to) # of credit cards

    possessed by families (e.g., family size, family income, etc.)..

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    15/48

    Simple and Multiple Regression Analysis

    i

    Family

    Number

    Actual # of

    Credit Cards

    Family Size

    1 4 2

    2 6 2

    3 6 4

    4 7 4

    5 8 5

    6 7 5

    7 8 6

    8 10 6

    y x

    We now can attempt to

    estimate # of credit cards

    from the information onfamily size, rather than

    from its own mean.

    Lets first see this graphically!

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    16/48

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    17/48

    Simple and Multiple Regression Analysis

    10

    9

    8

    7

    6

    5

    4

    3

    2

    1

    0

    F1

    F2

    F3

    F4 F5

    F6

    F7

    F8Y

    1 2 3 4 5 6 7

    Original (Baseline)

    Estimate

    X

    Family Size

    Generic Equation for any

    straight line: Y= a + bx

    xbay 11

    xbay 22

    xbay 33

    Regression Line

    yy

    yxay 0

    Regression Line

    (Line of Best Fit)--

    new improved

    location for CCestimates (see next

    slide)

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    18/48

    Simple and Multiple Regression Analysis

    10

    9

    8

    7

    6

    5

    4

    3

    2

    1

    0

    F1

    F2

    F3

    F4F5

    F6

    F7

    F8Y

    1 2 3 4 5 6 7

    Original(Baseline)Estimate

    X

    Family Size

    Reg. Line (Line ofBest Fit)--new

    improved location

    for CC estimates

    y

    Estimation ERROR )( yy

    Regression Line will

    Minimize = total estimation error.2

    )( yy

    bxay

    But, how do we know the values aand b in (the reg. line)?bxay

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    19/48

    2)(

    ))((

    xx

    yyxx

    b

    xbya

    Actual # of credit cards

    bxay

    Lets use above formulas to compute the values ofa

    andb for the regression line in our example.

    We will need: and

    EQUATION FOR REGRESSION LINE (LINE OF BEST FIT)--

    Values ofa and b for the regression line:

    ,y,x ),)(( yyxx

    2

    )( xx

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    20/48

    Simple and Multiple Regression Analysis

    i

    Family

    Number

    Actual#

    of Credit

    Cards

    Family

    Size

    1 4 2 ? ? ? ?2 6 2 ? ? ? ?

    3 6 4 ? ? ? ?

    4 7 4 ? ? ? ?

    5 8 5 ? ? ? ?

    6 7 5 ? ? ? ?

    7 8 6 ? ? ? ?

    8 10 6 ? ? ? ?

    78

    56Y 25.4

    8

    34x

    xx

    ?))(( yyxx

    yy ))(( yyxx

    ?2)( xx

    2

    )( xx

    We need: ),)(( yyxx 2

    )( xxand,y ,x

    y x

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    21/48

    Simple and Multiple Regression Analysis

    i

    Family

    Number

    Actual #

    of Credit

    Cards

    Family

    Size

    1 4 2 -2.25 -3 6.75 5.0625

    2 6 2 -2.25 -1 2.25 5.0625

    3 6 4 -.25 -1 .25 .0625

    4 7 4 -.25 0 0 .0625

    5 8 5 .75 1 .75 .5625

    6 7 5 .75 0 0 .5625

    7 8 6 1.75 1 1.75 3.0625

    8 10 6 1.75 3 5.25 3.0625

    7

    8

    56Y 25.4

    8

    34x

    xx

    17))(( yyxx

    yy ))(( yyxx

    5.17

    2

    )( xx

    2

    )( xx

    We need: ),)(( yyxx 2

    )( xxand,y ,x

    y x

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    22/48

    REGRESSION LINE (LINE OF BEST FIT):

    a =2.87 b = .97

    971.5.17

    17

    2)(

    ))((

    xx

    yyxxb

    87.2)25.4(971.7 xbya

    xy 97.87.2

    bxay

    ?

    Y-Intercept

    ?

    Regression Coefficient

    Simple and Multiple Regression Analysis

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    23/48

    Simple and Multiple Regression Analysis

    10

    9

    8

    7

    6

    5

    4

    3

    2

    1

    0

    F1

    F2

    F3

    F4

    F5

    F6

    F7

    F8Y

    1 2 3 4 5 6 7

    Original(Baseline)

    Estimate

    X

    Family Size

    xy 97.87.2

    y

    Can we tell how much estimation error we have

    committed by using the new regression line?

    NewImprovedEstimates

    Yes, examine differencesbetween our households

    actual # of CCs and their new/regression estimates.

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    24/48

    Simple and Multiple Regression Analysis

    iFamily

    Numbe

    r

    Actual #

    of Credit

    Cards

    Family

    Size

    Regression

    Estimate

    Error

    (Residual)

    Errors

    Squared

    1 4 2 ? ? ?

    2 6 2 ? ? ?

    3 6 4 ? ? ?

    4 7 4 ? ? ?

    5 8 5 ? ? ?

    6 7 5 ? ? ?7 8 6 ? ? ?

    8 10 6 ? ? ?

    yy y2

    )( yy

    xy 97.87.2

    2)( yy

    xy

    y

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    25/48

    Simple and Multiple Regression Analysis

    iFamily

    Numbe

    r

    Actual #

    of Credit

    Cards

    Family

    Size

    Regression

    Estimate

    Error

    (Residual)

    Errors

    Squared

    1 4 2 4.81 -.81 .66

    2 6 2 4.81 1.19 1.42

    3 6 4 6.76 -.76 .58

    4 7 4 6.76 .24 .06

    5 8 5 7.73 .27 .07

    6 7 5 7.73 -.73 .537 8 6 8.7 -.7 .49

    8 10 6 8.7 1.3 1.69

    yy y2

    )( yy

    xy 97.87.2 81.4)2(97.87.2 y

    2)( yy5.486

    SSE= Sum of Squares Error (SS Residual)

    xy

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    26/48

    Simple and Multiple Regression Analysis

    Total Baseline Error using the mean (SS Total) 22.0

    New or Remaining Error (SS Error orSS Residual) 5.486 ~ 5.5

    QUESTION: How much of the original estimation error have we explainedaway (eliminated) by using the regression model (instead of the mean)?

    16.514 / 22 = .751 or 75% What is this called?

    % of differences in # of CCs among households that isexplained by differences in their family size.

    What does the remaining 25% represent?

    225.486 = 16.514 (SS Regression or SS Explained)

    QUESTION: What % of estimation error have we explained (eliminated byusing the regression model?

    Percent of variation (differences) in number of credit cards owned by families

    that can be accounted for by: (a) all other potential predictors not included in the

    model, beyond family size, and (b) unexplainable random/chance variations.

    X1

    Total Var.

    in Y = 22

    16.5

    5.5

    Y

    R2 =

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    27/48

    Simple and Multiple Regression Analysis

    R2 is a measure of our success regarding accuracy of our estimation effort.

    R2 = % of estimation error that we have been able to explain away byusing the regression model, instead of using the mean.

    R2 indicates how much better we can predict Y from information about

    Xs, rather than from using its own mean. R2= % of differences (variations) in Y values that is explained by

    (attributable to) differences in X values.

    Note: When dealing with only two variables (a single X and Y):

    Lets now examine all this graphically!

    866.75.22

    514.162 Rr

    R2 = SS Regression / SS Total = 16.5/22 = 75%

    Pearson Correlationof Y with X1(NOT controlling for anyother var.)

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    28/48

    yy ?

    Simple and Multiple Regression Analysis

    10

    9

    8

    7

    6

    5

    4

    3

    2

    1

    0

    F1

    F2

    F3

    F4F5

    F6

    F7

    F8Y

    1 2 3 4 5 6 7

    Original(Baseline)

    Estimate

    X

    Family Size

    xy 97.87.2

    Original

    Baseline

    ERROR

    for F1

    yy y

    yy

    New ERROR

    (Unexplained/

    RESIDUAL)

    Explained by

    REGRESSION

    Model

    Regression Line (New Improved Estimates):

    ?

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    29/48

    Simple and Multiple Regression Analysis

    5.5 = SSE =The amount of estimation error for the 8 sample familieswhen using simple regression (i.e., a regression model that includesonly information about family size).

    Can we reduce the amount of estimationerror (SSE) to an even lower level and,thus, improving the estimation process? How?

    Yes, by adding information on a second variables suspected to bestrongly related to # of credit cards (e.g., family income--X2).

    .

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    30/48

    Simple and Multiple Regression Analysis

    i

    Family

    Number

    yi

    Actual # of

    Credit Cards

    Family Size Family

    Income

    1 4 2 14

    2 6 2 16

    3 6 4 144 7 4 17

    5 8 5 18

    6 7 5 21

    7 8 6 17

    8 10 6 25

    Generic Equation for a linear plane:2211

    xbxbay

    1x 2x

    Lets examine the regression plane for our example graphically.

    We now can attempt

    to estimate # of CCs

    from our information

    on family size andfamily income!

    Our regression model

    will now be a linear

    plane, rather than astraight line!

    Y # f C dit C ds

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    31/48

    21 216.63.482. xxy

    Lets now see

    how much error

    in estimation we

    are committing

    by using this

    multiple

    regression

    model.

    Y = # of Credit Cards

    X1 = Family Size

    Family Income

    12

    11

    109

    8

    7

    65

    4

    3

    21

    0

    Formulas are available forcomputing values ofa, b1 and b2

    MULTIPLE REGRESSIONMODEL FOR OUR EXAMPLE:

    Actual

    Regression Estimate

    2211 xbxbay

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    32/48

    Simple and Multiple Regression Analysis

    iFamily

    Number

    Actual #

    of Credit

    Cards

    Family

    Size

    Family

    Income

    ($000)

    Regression

    Estimate

    Error

    (Residual)

    Errors

    Squared

    1 4 2 14 ? ? ?

    2 6 2 16 ? ? ?

    3 6 4 14 ? ? ?

    4 7 4 17 ? ? ?

    5 8 5 18 ? ? ?

    6 7 5 21 ? ? ?7 8 6 17 ? ? ?

    8 10 6 25 ? ? ?

    yy

    Y

    2

    )( yy

    2)( yy

    21 216.63.482. xxy y

    y 1x 2x

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    33/48

    Simple and Multiple Regression Analysis

    iFamily

    Number

    Actual#

    of Credit

    Cards

    Family

    Size

    Family

    Income

    ($000)

    Regression

    Estimate

    Error

    (Residual)

    Errors

    Squared

    1 4 2 14 4.77 -.77 .59

    2 6 2 16 5.20 .80 .64

    3 6 4 14 6.03 -.03 .00

    4 7 4 17 6.68 .32 .10

    5 8 5 18 7.53 .47 .22

    6 7 5 21 8.18 -1.18 1.397 8 6 17 7.95 .05 .00

    8 10 6 25 9.67 .33 .11

    yy

    Y

    2

    )( yy

    2

    )( yy3.05SSE= Sum of Squares Error (Residual)

    21 216.63.482. xxy 77.4)14(216.)2(63.482. y

    y 1x 2x

    Unique (additional) contribution of X2 (family income) beyond X1= ? 5.5 3.05 = 2.45

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    34/48

    ?Y-Intercept, a

    (NOTE: Only when all Xs

    can meaningfully take onvalue of zero, the intercept

    will have a meaningful/direct/

    practical interpretation.

    Otherwise, it is simply an aid

    in increasing accuracy of

    estimation.

    ?b1andb2= Regression Coefficients

    0.63: Among families of the same income, an increase in

    family size by one person would, on average, result in .63more credit cards.

    0.21: Among families of the same size, an income increase

    of $1,000, results in an average increase of 0.2 credit cards .

    bs represent effect of each X on Y when all other Xs are

    controlled for/held constant/taken into account

    i.e., after impacts of all other variables are accounted

    for (remember the high blood pressure-hearing

    problem connection?)

    Simple and Multiple Regression Analysis

    21 216.63.482. xxy

    The MULTIPLE REGRESSION MODEL FOR OUR EXAMPLE:

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    35/48

    SST = 22 SSE = 3.05

    What is our new R2?

    Simple and Multiple Regression Analysis

    21 216.63.482. xxy

    SS Regression = 223.05 = 18.95

    R2

    = 18.95 / 22 = .861 or 86%

    The MULTIPLE REGRESSION MODEL FOR OUR EXAMPLE:

    Percent of differences in households

    number of CCs that is explained by

    differences in family size and family

    income.

    The Remaining 14%?

    (3.05 / 22 = .14)

    Percent of variation in number of credit

    cards that can be accounted for by (a) all

    other relevant factors not included in the

    model, beyond family size and income, and

    (b) unexplainable random/chance variations.

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    36/48

    dcba

    caryx

    1

    dcba

    cbryx

    2

    X1=FamilySize

    X2= Family

    Income

    da

    bc

    Y= # of CC

    Pearson/simpleCorrelationof Y with X1(not controllingfor X2)

    Pearson/simpleCorrelation of Ywith X2 (notcontrolling for

    X1) ?

    Total Variation/Error in Y = SS Total = a + b + c + d = 22

    829.022

    11.15

    2

    yxr

    867.075.022

    5.161 yxr

    2398.063.0 Xy

    197.87.2 Xy r2 = ? R2 = (a+c) / (a+b+c+d)

    r2 = (b+c) / (a+b+c+d) = 15.12 / 22 = 0.687

    X1=Familysize

    Y

    SSR=

    a + c= 16.5

    YSSR =

    c + b= 15.12

    What do we call the square root of this?

    X2= FamilyIncome

    R2 = 16.5 / 22 = 0.75

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    37/48

    SSR = a + b +c = 18.95

    SST = a + b + c + d = 22

    R2 = SSR / SST = (a + b + c) / (a + b + c + d) = 18.95 / 22 = 86%

    SSE = ?

    SSE= d = 2218.95 = 3.05

    21 216.63.482. xxy

    X1=FamilySize

    X2= FamilyIncome

    da

    bc

    NOTE: c is explained by

    both X1 and X2

    R2 Graphically= ?

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    38/48

    Simple and Multiple Regression Analysis

    iFamily

    Number

    Actual#

    of Credit

    Cards

    Family

    Size

    Family

    Income

    ($000)

    Regression

    Estimate

    Error

    (Residual)

    Errors

    Squared

    1 4 2 14 4.77 -.77 .59

    2 6 2 16 5.20 .80 .64

    3 6 4 14 6.03 -.03 .00

    4 7 4 17 6.68 .32 .10

    5 8 5 18 7.53 .47 .22

    6 7 5 21 8.18 -1.18 1.397 8 6 17 7.95 .05 .00

    8 10 6 25 9.67 .33 .11

    yy

    Y

    2

    )( yy

    2

    )( yy3.05SSE= Sum of Squares Error (Residual)

    21 216.63.482. xxy 77.4)14(216.)2(63.482. y

    y 1x 2x

    Unique (additional) contribution of X2 = 5.5 3.05 = 2.45Remember:

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    39/48

    Exercise 1: Redo the credit card analysis withSPSS.

    First, Correlations and Simple Regression

    Next, Multiple Regression (also ask for part and

    partial correlations.)

    SPSS CREDIT CARD FILE

    http://localhost/var/www/apps/conversion/tmp/scratch_5//datastore01/website/mqm497/MQM%20497%20SPSS%20Data%20Files/MQM497%20Credit%20Card%20Regression%20Model.savhttp://localhost/var/www/apps/conversion/tmp/scratch_5//datastore01/website/mqm497/MQM%20497%20SPSS%20Data%20Files/MQM497%20Credit%20Card%20Regression%20Model.sav
  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    40/48

    Simple and Multiple Regression Analysis

    EXERCISE 2: Usinggss_2 data file, we are interested inunderstanding the role that the following demographics (age, educ, sibs,

    agewed), as well as respondent income (rincmdol), job satisfaction (satjob_2),and marriage satisfaction (hapmar_2) play in determining/predicting onesgeneral happiness (happy_2).

    We also wish to know which of the above variables is the strongest predictor of

    general happiness (Standardized Reg. Coefficients).

    Use the gss_2 data file and conduct the appropriate analysis.NOTE:

    satjob_2 is coded as: hapmar_2 is coded as:

    1 = Very Dissatisfied 1 = Not Too Happy

    2 = A Little Dissatisfied 2 = Pretty Happy

    3 = Pretty Satisfied 3 = Very Happy

    4 = Very Satisfied

    http://localhost/var/www/apps/conversion/tmp/scratch_5//datastore01/website/mqm497/MQM%20497%20SPSS%20Data%20Files/gss_2%20(gss%20with%20happy,%20satjob,%20%20and%20hapmar%20reverse%20coded%20for%20497).savhttp://localhost/var/www/apps/conversion/tmp/scratch_5//datastore01/website/mqm497/MQM%20497%20SPSS%20Data%20Files/gss_2%20(gss%20with%20happy,%20satjob,%20%20and%20hapmar%20reverse%20coded%20for%20497).sav
  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    41/48

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    42/48

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    43/48

    Simple and Multiple Regression Analysis

    EXAMPLE 1: Income = 24000 + 1400 gender.

    Coded: Female = 0, Male = 1

    Income = 12000 + 1000 Education Years + 800 Gender

    Coded: Female = 0, Male = 1

    Meaning?Meaning?

    Average income of females

    with no education is $12000.

    Among people of the same gender, everyadditional year of education results in an

    average additional income of $1,000.

    Males make, on average, $800 more in

    comparison with females who have the

    same number of years of education.

    Average income of females is $24,000.

    Males on average make $1400 more than females

    MULTIPLE REGRESSION EXAMPLE 2:

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    44/48

    Exercise 4: Suppose we are interested inknowing what role, if any, demographiccharacteristics (i.e., age, sex_Dummy, educ,sibs, agewed, incomdol), as well as job

    satisfaction (satjob-2), and marriagesatisfaction (hapmar-2) play in determiningones overall happiness in life (happy-2).

    Use the gss_2 data file and conduct theappropriate analysis.

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    45/48

    Exercise 3: Suppose we are interested inknowing what role, if any, the followingdemographic characteristics play indetermining ones income (rincmdol):

    Age,Sex_Dummy(0=male, 1=female),

    age first married (agewed),

    Years of education completed (educ), and

    Political party affiliation--republic(0=Democrat,1=Republican) .

    Use the gss_2 data file and conduct theappropriate analysis.

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    46/48

    Assignment 5Data file Salary.sav contains information about 474 employees hired by a Midwestern bank

    between 1969 and 1971 (NOTE: Due to SPSS site license restrictions, this hyperlink will

    not work if you are off campus). Of the 474 employees, 258 were men, 216 women, 370white, and 104 non-white. The bank was subsequently involved in EEOC litigation; the

    bank was accused of gender and race discrimination in its hiring and compensation

    practices. The two issues that were of particular interest in the litigation were alleged

    gender and racial inequalities not only in the banks beginning salaries (variable salbeg),

    but also in its later salaries (variable salnow).

    1. Print, examine, and interpret correlation coefficients between beginning salary(salbeg) and age in years (age), education in years (edlevel), employment category or job

    classification level--rated from 1=lowest to 8=highest (jobcat), and work experience in

    months (work).

    2. Conduct the appropriate analysis to see: (a) What role each of the variables age,

    education (edlevel), employment category (jobcat), and work experience (work) played,

    holding all other variables constant, in determining the banks beginning salaries? Forexample, what was the differential pay for one additional year of education among new

    hires who otherwise had the same age, employment category, and work experience? (b)

    Which of the above demographic characteristics had the strongest influence on beginning

    pay? How can you tell? (c) What percent of the differences in employees beginning

    salaries can be explained by/attributed to difference in all of the above characteristics?

    http://www.cob.ilstu.edu/udrive/MQM/MQM%20497/Hemmasi/MQM497_Data_Files/SALARY.savhttp://www.cob.ilstu.edu/udrive/MQM/MQM%20497/Hemmasi/MQM497_Data_Files/SALARY.sav
  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    47/48

    Assignment 53. Now conduct the appropriate analysis to indicate, holding all other variables

    constant, what roles gender(sex, male=0, female=1) played in determining beginning

    salaries at the bank. That is, what was the differential beginning pay between male andfemale employees who otherwise had the same age, education, employment category, and

    work experience? Does this evidence support the charges of gender discrimination in the

    banks practices regarding initial compensation?

    4. During litigation, it was charged that the banks unfair compensation practices had

    continued beyond its initial salary decisions. That is, the prosecution claimed that with

    time, not only the beginning salary disparities between men and women did not shrink, butfurther widened. Conduct the appropriate analysis to indicate (a) everything else being

    equal, what roles gender played in determining employees later salaries at the bank

    (salnow). That is, what was the average differential pay between male and female

    employees who otherwise had the same age, education, employment category, work

    experience, andjob seniority (variable timerepresents seniority in terms of number of

    months employed at the bank)? (b) Compare the later pay disparities you have justidentified with the beginning pay disparities you had found in question 3 above to explain

    if the evidence supports the prosecutions charges of continued gender discrimination

    beyond initial salary decisions, resulting in widening disparities in later pay.

    NOTE: For each question, provide thorough explanations on corresponding pages and

    parts of your printout.

  • 7/28/2019 12.Simple and Multiple Regression Analysis-LDR 280

    48/48

    Simple and Multiple Regression Analysis

    QUESTIONS OR

    COMMENTS?