124
1 Logistic regression

1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

Embed Size (px)

Citation preview

Page 1: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

1

Logistic regression

Page 2: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

2

Regression

• Regression is a set of techniques for exploiting the presence of statistical associations among variables to make predictions of values of one variable (the DV, TARGET or CRITERION) from knowledge of the values of other variables (the IVs or REGRESSORS).

Page 3: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

3

Simple and multiple regression

• In SIMPLE regression, there is just one IV.

• In MULTIPLE regression, there are two or more IVs.

• In simple regression, a REGRESSION LINE is drawn through the points in the scatterplot.

Page 4: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

4

Page 5: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

5

General form of the simple regression equation

Page 6: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

6

Estimates

• The points on the regression line serve as ESTIMATES of the target variable or DV Y from the values of the IV X.

Page 7: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

7

Residuals

• We can make a good estimate of John’s score on Actual violence from a knowledge of the regression line and his score on Exposure.

• But such estimation is subject to ERROR.

• The error in our estimate is known as a RESIDUAL.

Page 8: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

8

Page 9: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

9

Goodness-of-fit: The LEAST-SQUARES

criterion

Page 10: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make
Page 11: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

11

Ordinary least-squares (OLS) regression

• This approach to regression is known as ORDINARY LEAST SQUARES (OLS) regression.

• There are other kinds of regression (such as LOGISTIC REGRESSION, today’s topic) which do not work in this way.

Page 12: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

12

Coefficient of determination

Page 13: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

13

Using more than one IV

• We could try to predict a person’s actual violence not only from exposure to screen violence, but also from additional variables, such as number of years of education and characteristics of the parents.

• These are problems in MULTIPLE REGRESSION.

Page 14: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

14

Multiple regression

Page 15: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

15

Partial regression coefficients

• In multiple regression, a PARTIAL REGRESSION COEFFICIENT is the estimated average change in the DV resulting from an increase of one unit in one particular IV with ALL THE OTHER IVs HELD CONSTANT.

Page 16: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

16

Page 17: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

17

Coefficient of determination

• In multiple regression, the COEFFICIENT OF DETERMINATION is the square of the multiple correlation coefficient.

Page 18: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

18

Page 19: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

19

What if the DV is a set of categories?

• Simple and multiple OLS regression assume that the DV and IVs consist of measures on an independent scale with units. The term CONTINUOUS VARIABLE is used for this sort of DV.

• But suppose we want to predict whether a person will suffer from a heart attack or contract a certain illness with known risk factors.

• Here, we are not predicting a VALUE, but membership of a CATEGORY.

Page 20: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

20

Category prediction: the OLS approach

• You are trying to predict the presence or absence of a blood condition, which is thought to be made more likely by smoking and alcohol consumption

• Why not let 0 = Condition Absent; let 1 = Condition Present and calculate the usual OLS multiple regression equation?

Page 21: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

21

Problems

• There are serious problems with OLS regression when the DV is a set of categories.

• None of the proposed solutions is entirely satisfactory.

• There are better approaches.

Page 22: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

22

Techniques for regression with a categorical DV and continuous IVs

The 2 most commonly used techniques are:

1.Discriminant analysis

2.Logistic regression

Page 23: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

23

Discriminant analysis

• If all (or most) IVs are continuous, you can use DISCRIMINANT ANALYSIS (DA).

• But the DA model makes assumptions about the distributions of the IVs which data sets often fail to satisfy.

Page 24: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

24

Logistic regression

• Logistic regression makes fewer assumptions than discriminant analysis.

• Logistic regression, moreover, is happy with CATEGORICAL IVs.

Page 25: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

25

Logistic regression…

• It is suspected that smoking and drinking are risk factors in the incidence of a pre-morbid blood condition, characterised by the presence of an antibody.

• Records of the incidence of the condition in 100 patients are available, together with estimates of the amount they smoke and drink.

Page 26: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

26

A section of the data

Page 27: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

27

How many have the condition?

Page 28: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

28

Forty-four patients have the condition

Page 29: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

29

Back to OLS regression again

• CONSTANT = MY – b1 MX

• If X and Y are independent, the regression line will be close to horizontal (b1 = 0).

• You might as well just guess the value of Y as MY

every time, whatever the value of X.

Page 30: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

30

Regression line with independence

• When the variables show no association, the slope of the regression line is zero and the line runs horizontally through the mean MY of the criterion or dependent variable.

Page 31: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

31

Intercept-only prediction

• Whatever the degree of association between X and Y, the INTERCEPT- ONLY prediction is

YY M

Page 32: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

32

Building regression models

• In model-building, we use the intercept-only (no regression) model as a BASELINE to assess the relative ability of the regression model to explain the variance of Y.

• In our current example, the equivalent prediction is that, since 44/100 people have the condition, an individual will NOT have the condition.

Page 33: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

33

Guessing: the no-regression success rate

• If we decide to predict absence for every case, we shall be correct 56/100 times.

• This is the equivalent of INTERCEPT-ONLY prediction in linear regression.

Page 34: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

34

Assumption

• Either you have the disease or you don’t. • As smoking and alcohol increase, however, we

assume that the probability of developing the condition increases CONTINUOUSLY as a function of the IVs.

• In logistic regression, we estimate the probability of the condition with the LOGISTIC REGRESSION FUNCTION

• Once the estimated probability exceeds a cut-off (usually .5), the case is classified as a Yes, rather than a No.

Page 35: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

35

A logistic regression function

Page 36: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

36

The decision rule

If the predicted probability is .5 or higher, assign to the condition-present category.

Page 37: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

37

The odds

• In an EXPERIMENT OF CHANCE (tossing a coin, rolling a die) the ODDS in favour of an event is the number of ways in which the event could occur, divided by the number of ways in which it could fail to occur.

Page 38: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

38

The odds …

• Roll a die. There’s one way of getting a six; there are 5 ways of not getting a six.

• The odds in favour of a six when a die is rolled are 1 to 5.

• Suppose we know that out of 100 people, 44 have a certain antibody in their blood.

• The ODDS in favour of a person having the antibody are 44 to 56 or 44/56.

Page 39: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

39

The log odds (logit)

• The odds measure suffers from ASYMMETRY OF RANGE.

• Unlikely events have odds between 0 and 1; likely events can have huge odds.

• The LOG ODDS (LOGIT) is the natural logarithm of the odds.

• Logit = ln(odds) = loge(odds).

Page 40: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

40

When the logit is zero

• Suppose the odds are 50 to 50 (50/50 =1).

• Since the log of 1 is zero, a logit of zero means that the odds for are equal to the odds against.

Page 41: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

41

Range of the logit

• The logit has a symmetrical range: a positive sign means the odds are in favour; a negative sign means the odds are against.

• The logit has no upper or lower limit: it has an unlimited range of values.

Page 42: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

42

Example

• The odds in favour of a case having the antibody are 44/56 = 11/14.

• Logit = ln(11/14) = –.24

• The event is less likely than not.

• If the odds in favour were 56/44, the logit would be ln(56/44) = +.24.

Page 43: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

43

Probability

• A probability is a measure of likelihood ranging from 0 (an impossibility) to 1 (a certainty).

• The probability p of an event is the number of ways it can happen divided by the total number of outcomes.

• The probability of a six when a die is rolled is 1/6.

Page 44: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

44

Relationship between p and odds

• A probability and the odds are both measures of likelihood.

• They are related according to the equation on the left.

Page 45: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

45

Antilogs

• We can write any positive number as an ANTILOG, that is, as the BASE raised to the power of the LOG of the number to that base.

Page 46: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

The antilog

Page 47: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

47

The odds as an antilog

lnlog odds odds logitodds base e e

Page 48: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

48

The probability and the logit

1

logit

logit

odds ep

odds 1+e

Page 49: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

49

The logit equation

1 1 ...0 p p

It is assumed that the logit is

a linear function Z

of the IVs thus:

Z=b b X b X

Page 50: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

50

The logistic regression equation

Page 51: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

51

The problem

• In the logit equation, we must find values for the intercept and the regression coefficients such that the accuracy of assignments of cases to categories is maximised.

1 1 ...0 p p

The logit equation

Z=b b X b X

Page 52: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

52

No mathematical solution

• In logistic regression, there is no equivalent of the formulae for the intercept and coefficients in OLS regression.

• A ‘brute force’ computing algorithm is used in the hope that estimates of the coefficients will ‘converge’ to stable values.

• We must check this ‘convergence’ by examining the ITERATION HISTORY in the SPSS output.

Page 53: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

53

Potential difficulties

• The algorithm will not run successfully if the IVs are too highly correlated. This is the familiar MULTICOLLINEARITY PROBLEM we encountered in OLS regression.

• As with any multiple regression, it can be difficult to attribute the DV (category membership in this case) unequivocally to any one DV.

Page 54: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

54

The meaning of a logistic regression coefficient

• The regression coefficient is the increase in the logit in favour of an individual having the condition produced by an increment of one unit in the IV.

• Suppose that for Smoking, b = 1.1. An increase of one smoking unit (eg 10 cigarettes) increases the logit (the log odds) by 1.1.

Page 55: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

55

Regression coefficients …

• In terms of the ODDS, an increase of one unit in the IV MULTIPLIES the original odds by the ANTILOG of b, that is, by eb, or exp(b).

• Exp(1.1) = 3.0• So an increase of one smoking unit results

in the odds being MULTIPLIED by 3, that is, the event is three times as likely to happen.

Page 56: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

56

Correlations

Page 57: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

57

Observations

• There’s a substantial correlation between one of the IVs and the DV. Good.

• There’s little association between the IVs. Very good.

• It looks good for the regression procedure.

Page 58: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

58

Finding logistic regression

Page 59: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

59

Page 60: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

60

Page 61: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

61

Page 62: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

62

Classification Tablea,b

56 0 100.0

44 0 .0

56.0

ObservedNo

Yes

Blood Condition

Overall Percentage

Step 0No Yes

Blood Condition PercentageCorrect

Predicted

Constant is included in the model.a.

The cut value is .500b. ‘Intercept-only’

hit rate.‘Intercept-only’

estimation

Page 63: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

63

Iteration

• The logistic regression procedure chooses values for the coefficients such that the logistic regression function maximises correct category assignment.

• But (unlike the situation in multiple regression) there is no mathematical solution to this problem: the procedure follows an algorithm which continuously works out estimates until these ‘converge’ on the set of values eventually used.

Page 64: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

64

Iteration…

• You need to check that the procedure produced estimates that did indeed converge.

• You should ask for an ITERATION HISTORY to confirm that convergence took place.

Page 65: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

65

Iteration Historya,b,c,d

98.522 -.906 .472 .004

88.269 -1.030 .875 -.029

80.474 -1.202 1.530 -.061

78.107 -1.355 2.108 -.078

77.999 -1.392 2.256 -.079

77.999 -1.394 2.264 -.078

77.999 -1.394 2.264 -.078

Iteration1

2

3

4

5

6

7

Step1

-2 Loglikelihood Constant Smoking Alcohol

Coefficients

Method: Entera.

Constant is included in the model.b.

Initial -2 Log Likelihood: 137.186c.

Estimation terminated at iteration number 7 becauseparameter estimates changed by less than .001.

d.

Page 66: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

66

The Nagelkerke R2 statistic

• The Nagelkerke statistic is the counterpart of the coefficient of determination R2 in OLS multiple regression.

• It is a measure of the proportion of the total variation in incidence of the blood condition accounted for by regression.

Page 67: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

67

Model Summary

77.999a .447 .599Step1

-2 Loglikelihood

Cox & SnellR Square

NagelkerkeR Square

Estimation terminated at iteration number 7 becauseparameter estimates changed by less than .001.

a.

Page 68: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

68

Hosmer and Lemeshow Test

6.155 7 .522Step1

Chi-square df Sig.

Page 69: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

69

Contingency Table for Hosmer and Lemeshow Test

13 11.254 0 1.746 13

12 11.004 1 1.996 13

4 3.311 0 .689 4

17 19.523 7 4.477 24

5 6.087 3 1.913 8

4 4.134 10 9.866 14

1 .687 9 9.313 10

0 .000 9 9.000 9

0 .000 5 5.000 5

1

2

3

4

5

6

7

8

9

Step1

Observed Expected

Blood Condition = No

Observed Expected

Blood Condition = Yes

Total

Page 70: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

70

Classification Tablea

51 5 91.1

10 34 77.3

85.0

ObservedNo

Yes

Blood Condition

Overall Percentage

Step 1No Yes

Blood Condition PercentageCorrect

Predicted

The cut value is .500a.

Hit rate using the regression model. This is obviously much better than the ‘intercept-only’ hit rate

of 56%.

A regression model is now applied.

Page 71: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

71

Variables in the Equation

2.264 .513 19.490 1 .000 9.623

-.078 .085 .846 1 .358 .925

-1.394 .373 13.979 1 .000 .248

Smoking

Alcohol

Constant

Step1

a

B S.E. Wald df Sig. Exp(B)

Variable(s) entered on step 1: Smoking, Alcohol.a.

This is the antilog of the coefficient of Smoking in the logit equation. Increasing Smoking by one unit MULTIPLIES the odds in favour of occurrence by about 10.

Page 72: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

72

0 1 1 2 2

1.394 2.64( ) .078( )

The logit equation

Z b b X b X

Smoking Alcohol

Page 73: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

73

1.394 2.64 .078

1.394 2.64 .0781 1

Z Smoking Alcohol

Z Smoking Alcohol

Logistic function

e ep

e e

Page 74: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

74

Page 75: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

75

Summary

• The incidence of the blood condition is indeed predictable from regression and raises the hit rate from 54% to 85%.

• Smoking contributes significantly to the model.

• Alcohol does not contribute significantly to the model.

Page 76: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

76

An example with categorical IVs

• An experiment on helpfulness.

• Do people tend to be more helpful to those of the opposite sex? This is the OPPOSITE-SEX DYADIC HYPOTHESIS.

• Male and female interviewers asked male and female participants whether they would help in a hypothetical situation.

Page 77: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

77

Page 78: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

78

5 / 20.118

17 /8

Odds ratio (OR)

odds (helping) in Male groupOR

odds (helping) in Female group

number of males who helped/number who did not

number of females who helped/number who did not

Page 79: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

79

Interviewer’s sex

11/141

11/14

Odds ratio (OR)

odds (helping) in Male groupOR

odds (helping) in Female group

number of males who helped/number who did not

number of females who helped/number who did not

Helped? No Yes

Page 80: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

80

Interpretation of ORs

• When the interviewer is female, the OR is .118; but when the interviewer is male, the OR is 1.

• This asymmetrical pattern only partially confirms the opposite-sex dyadic hypothesis.

• But it seems that an interaction is present.

Page 81: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

81

Page 82: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

82

Page 83: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

83

Complete the dialog

Page 84: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

84

Page 85: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

85

Page 86: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

86

Page 87: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

87

An interaction pattern

Page 88: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

88

Page 89: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

89

Page 90: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

90

Page 91: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

91

Page 92: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

92

Page 93: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

93

Page 94: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

94

Classification Tablea,b

56 0 100.0

44 0 .0

56.0

ObservedNo

Yes

Would you help?

Overall Percentage

Step 0No Yes

Would you help? PercentageCorrect

Predicted

Constant is included in the model.a.

The cut value is .500b.

Page 95: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

95

Iteration Historya,b,c,d

125.107 -.240 .000 .960 -1.920

124.957 -.241 .000 .995 -2.131

124.957 -.241 .000 .995 -2.140

124.957 -.241 .000 .995 -2.140

Iteration1

2

3

4

Step1

-2 Loglikelihood Constant Interviewee(1) Interviewer(1)

Interviewee(1)by

Interviewer(1)

Coefficients

Method: Entera.

Constant is included in the model.b.

Initial -2 Log Likelihood: 137.186c.

Estimation terminated at iteration number 4 because parameter estimates changed byless than .001.

d.

Page 96: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

96

Model Summary

124.957a .115 .154Step1

-2 Loglikelihood

Cox & SnellR Square

NagelkerkeR Square

Estimation terminated at iteration number 4 becauseparameter estimates changed by less than .001.

a.

Page 97: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

97

Hosmer and Lemeshow Test

.000 2 1.000Step1

Chi-square df Sig.

Page 98: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

98

Contingency Table for Hosmer and Lemeshow Test

20 20.000 5 5.000 25

14 14.000 11 11.000 25

14 14.000 11 11.000 25

8 8.000 17 17.000 25

1

2

3

4

Step1

Observed Expected

Would you help? = No

Observed Expected

Would you help? = Yes

Total

Page 99: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

99

Classification Tablea

48 8 85.7

27 17 38.6

65.0

ObservedNo

Yes

Would you help?

Overall Percentage

Step 1No Yes

Would you help? PercentageCorrect

Predicted

The cut value is .500a.

Page 100: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

100

Variables in the Equation

.000 .570 .000 1 1.000 1.000

.995 .588 2.860 1 .091 2.705

-2.140 .871 6.038 1 .014 .118

-.241 .403 .358 1 .549 .786

Interviewee(1)

Interviewer(1)

Interviewee(1) byInterviewer(1)

Constant

Step1

a

B S.E. Wald df Sig. Exp(B)

Variable(s) entered on step 1: Interviewee, Interviewer, Interviewee * Interviewer .a.

The only significant result is an interaction between Sex of

Interviewer and Sex of Interviewee.

Page 101: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

101

Page 102: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

102

Conclusion

• The OPPOSITE-SEX DYADIC HYPOTHESIS receives some support from this study.

• Both sexes, however, tended to be on the unhelpful side with female interviewers.

Page 103: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

103

A loglinear analysis

• Since we have categorical variables, we can apply a loglinear analysis to the same data.

• We can expect a similar result.

Page 104: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

104

Page 105: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

105

Page 106: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

106

Page 107: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

107

K-Way and Higher-Order Effects

7 13.673 .057 12.960 .073 0

4 12.229 .016 11.688 .020 2

1 6.323 .012 6.231 .013 2

3 1.443 .695 1.272 .736 0

3 5.906 .116 5.457 .141 0

1 6.323 .012 6.231 .013 0

K1

2

3

1

2

3

K-way and HigherOrder Effects

a

K-way Effectsb

df Chi-Square Sig.

Likelihood Ratio

Chi-Square Sig.

Pearson Number ofIterations

Tests that k-way and higher order effects are zero.a.

Tests that k-way effects are zero.b.

This is the row reporting the results of a test for an interaction between Sex of Interviewer and Sex of Interviewee. It is the only significant effect.

This p-value is similar to the value .014 that we obtained with logistic regression.

Page 108: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

108

Conclusion

• The loglinear analysis leads to exactly the same conclusion as the logistic regression.

• Both techniques confirm that the most important determinant of whether help is given is the sexual homogeneity or heterogeneity of the participant-interviewer dyad.

Page 109: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

109

The next step

• Our session has been merely an introduction to the technique of logistic regression.

• The next step is to do some further reading.

Page 110: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

110

Getting started

• There’s an elementary section on logistic regression in

–Kinnear, P., & Gray, C. (2006). SPSS14 made simple. Hove: Psychology Press. Chapter 14.

• The treatment is merely an outline, but it would get you started. At least it would familiarise you with the SPSS output.

Page 111: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

111

An excellent textbook

• Howell, D. C. (2007). Statistical methods for psychology (6th ed.). Belmont, CA: Thomson/Wadsworth.

• There’s a helpful introduction to logistic regression in Chapter 15, the multiple regression chapter.

Page 112: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

112

Sage paperbacks

• Menard, S. (2002). Applied logistic regression analysis (2nd ed.). London: Sage.

• Jaccard, J. (2001). Interaction effects in logistic regression. London: Sage.

Page 113: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

113

• Tabachnik, B. G., & Fidell, L. S. (2007). Using multivariate statistics (5th ed.). Boston: Allyn & Bacon.

• Field, A. (2005). Discovering statistics using SPSS for Windows: Advanced techniques for the beginner (2nd ed.). London: Sage.

Page 114: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

114

Appendix

Logarithms

Page 115: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

115

A logarithmic system

• In a LOGARITHMIC SYSTEM, numbers are expressed as POWERS of a constant known as the BASE.

• The numbers 10, 100, 1000, 10,000,

100, 000 and 1, 000,000 can all be expressed as powers of 10 thus:

Page 116: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

116

Page 117: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

117

Definition of a log

• The log of a number is the power to which the base must be raised to equal the number.

• So, since 1000 = 103, the log of 1000 to the base 10 is 3.

• The two most common bases are 10 and the number e, where e ≈ 2.72

• Logs to the base e are known as NATURAL LOGS.

Page 118: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

118

Page 119: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

119

Notation for logs

Page 120: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

The antilog

Page 121: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

The exponential function

Page 122: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

122

The laws of logs

• The definition of the antilog is the key to the derivation of the LAWS OF LOGARITHMS.

• For example, the log of the PRODUCT is the SUM of the logs.

Page 123: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

123

Three laws of logs

• The log of the PRODUCT is the SUM of the logs.

• The log of the QUOTIENT is the DIFFERENCE between the logs.

• The log of the POWER is the power TIMES the log.

Page 124: 1 Logistic regression. 2 Regression Regression is a set of techniques for exploiting the presence of statistical associations among variables to make

124

Things to remember about logs

• There is no log for a negative number.

• The log of zero is –∞ (minus infinity).

• A log can have a negative value. It does so when the number is a PROPER FRACTION (numerator less than the denominator, so with a value between zero and 1).

• The log of 1 is zero.