Intro to Logistic and Multinomial Regression
By Steven Drury
Outline of Presentation
●Binary Logistic Regression Model●Cumulative logit model for ordinal outcome●Generalized logit model for nominal outcome
Binary Logistic Regression Model
Suppose a pair of variables (x,y) is observed on some individuals, where x is a continuous variable, whereas y is a binary variable, that is, y assumes only two values.
Examples (Binary by nature).
relief v. no relief from a certain medical condition;
voted ‘yes’ v. ‘no’ on proposition XX;
HIV infection v. no infection;
won v. lost;
dead v. alive;
Continuous Variables Can be Made Binary
Examples:
(i) excess body weight loss <20% v. 20% or more;
(ii) PTSD symptom score ranges 17 to 85 with a cutoff at 50: diagnosed with PTSD if score>=50 v. no PTSD if score <50;
(iii) spends above $X on entertainment weekly v. spends less than $X;
(iv) runs marathon under 3 hours v. runs longer than 3 hours.
Scatterplot
If we plot y (with values coded 0 and 1) against x, the scatterplot may look something like this:
Problem
If we fit a linear regression model to this data, the residuals will not be normally distributed and one of our assumptions is violated.
Introducing the Binary Logistic Regression Model
A binary (dichotomous) logistic regression is used to model
. The model with predictors has the form
.
Define the odds in favor of as the ratio . We can rewrite the logistic regression above in terms on the odds,
)1( YP kxx ,...,1
.)...(1
)...()1(
110
110
kk
kk
xxExp
xxExpYP
1Y.
)0(
)1(
YP
YP
)....()0(
)1(110 kk xxExp
YP
YP
Goodness of FitThere are 3 ways to check how well the model fits the data:
Pseudo R-square – Doesn't represent the proportion of variation in Y like R-square. We are looking for large values to indicate a good fit.
Max-Rescaled R-square - Is defined as pseudo R-square divided by its maximum. We are also looking for large values to indicate a good fit.
Hosmer-Lemeshow goodness-of-fit test - With the null hypothesis that the model has a good fit. P-value in excess of 0.05 is desirable.
Interpretation of Regression Coefficients
When is continuous, then the quantity represents the estimated percent change in odds in favor of Y=1 when is increased by one unit, and the other variables are held fixed.
%100)1)ˆ(( 1 Exp
1x
.1)ˆ(1)ˆ...ˆˆˆ(
)ˆ...ˆ)1(ˆˆ(1
22110
22110
Exp
xxxExp
xxxExp
odds
oddsodds
kk
kk
old
oldnew
1x
1x
Interpretation of Regression Coefficients
If is a categorical variable with two levels, then the quantity represents the estimated percent ratio in odds for the upper level of (when ) and that for the lower level (when ), provided the other variables are held fixed. To see that, write
1x%100)ˆ( 1 Exp
1x 11 xx11 x
).ˆ()ˆ...ˆˆ(
)ˆ...ˆˆˆ(1
220
2210
0
1
1
1
Exp
xxExp
xxExp
odds
odds
kk
kk
x
x
Interpretation of Regression Coefficients
If is a categorical variable with levels, then dummy variables are included into the model that correspond to with the th level being thereference level. The quantity represents the estimated percent ratio in odds for the level and that for the reference level provided the other variables are held fixed. This follows from the fact that
1x m1m
11,..., m m%100)ˆ( 1 Exp
,1 mx 11 xx
).ˆ()ˆ...ˆˆ(
)ˆ...ˆˆˆ(1
0
101
1
1
Exp
xxExp
xxExp
odds
odds
kkmm
kkmm
mx
x
Example
Dermatologists at a large hospital study patients with acute psoriasis, a skin disease. They randomly assign patients to three groups: taking drug A, drug B, or placebo. There are 45 patients in the study, 15 per group. The outcome is whether the patient felt a relief from psoriasis symptoms (1=relief, 0=no relief). Data are collected on gender, age, and group. The following SAS code fits the logistic regression model to the data.
SAS Application: Code
data psoriasis; input gender$ age drug$ relief$ @@; datalines;M 25 A Yes M 25 A Yes M 41 A Yes M 42 A YesM 43 A Yes M 51 A Yes M 59 A Yes M 59 A YesF 29 A Yes F 35 A Yes F 42 A Yes F 56 A YesF 65 A Yes F 40 A No F 61 A No M 29 B YesM 33 B Yes M 39 B Yes M 42 B Yes M 46 B YesM 42 B No M 48 B No M 62 B No F 36 B YesF 47 B Yes F 28 B No F 38 B No F 39 B NoF 50 B No F 60 B No M 42 P Yes M 46 P YesM 24 P No M 25 P No M 60 P No M 67 P NoF 28 P Yes F 32 P Yes F 35 P Yes F 42 P NoF 48 P No F 53 P No F 57 P No F 58 P NoF 65 P No; proc logistic data=psoriasis; class gender (ref='F') drug(ref='P')/param=ref; model relief(event='Yes')=gender age drug/rsq lackfit;run;
The Important Features of the SAS Code
Options (ref='F') and (ref='P') define reference categories for gender and drug.
Option param=ref creates proper dummy variables for gender and drug.
Option rsq computes the pseudo R-square and max-rescaled R-square.
Option lackfit performs the Hosmer-Lemeshow goodness-of-fit test.
Relevant SAS OutputR-Square 0.3304 Max-rescaled R-Square 0.4424
Analysis of Maximum Likelihood Estimates
Parameter DF Estimate Standard Error
Wald Chi-Square
Pr > ChiSq
Intercept 1 1.9698 1.5911 1.5327 0.2157
gender M 1 1.1080 0.7592 2.1297 0.1445
age 1 -0.0722 0.0344 4.4035 0.0359
drug A 1 2.9828 1.0969 7.3945 0.0065
drug B 1 0.3443 0.8445 0.1662 0.6835
Odds Ratio Estimates
Effect Point Estimate 95% Wald Confidence Limits
gender M vs F 3.028 0.684 13.410
age 0.930 0.870 0.995
drug A vs P 19.744 2.300 169.484
drug B vs P 1.411 0.270 7.386
Hosmer and Lemeshow Goodness-of-Fit Test
Chi-Square DF Pr > ChiSq
5.1720 7 0.6390
Results
Age and drug A are significant predictors of relief from psoriasis (age at the 5%, drug A at the 1%).
This model has a good fit because the P-value of the Hosmer-Lemeshow test is 0.6390 > 0.05. Also the pseudo R-squared (0.3304) and max-rescaled R-squared (0.4424) are not very small.
The fitted model is
. ) 0.3443 2.9828 0.0722 1080.19698.1() (ˆ
)(ˆdrugBdrugAageMaleExp
reliefnoP
reliefP
Interpretation of Beta Coefficients
The odds in favor of psoriasis relief for males are 3.028 times those for females (302.8%).
As age increases by one year, the odds in favor of psoriasis relief decrease by 7%=(0.93-1)100%.
The odds in favor of psoriasis relief for drug A patients are 19.744 times those of placebo patients (or 1,974.4%).
The odds in favor of psoriasis relief for drug B patients are 1.411 times those of placebo patients (or 141.1%).
R Application Code
Gender<-c(1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0)
Age<-c(25,25,41,42,43,51,59,59,29,35,42,56,65,40,61,29,33,39,42,46,42,48,62,36,47,28,38,39,50,60,42,46,24,25,60,67,28,32,35,42,48,53,57,58,65)
DrugA<-c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
DrugB<-c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
Relief<-c(1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,1,1,1,0,0,0,1,1,0,0,0,0,0,1,1,0,0,0,0,1,1,1,0,0,0,0,0,0)
logr.drugs <- glm(Relief ~ Gender + Age + DrugA + DrugB, family=binomial)
confint(logr.drugs)
exp(coef(logr.drugs))
exp(cbind(OR = coef(logr.drugs), confint(logr.drugs)))
Relevant R OutputDeviance Residuals:
Min 1Q Median 3Q Max
-2.0900 -0.7084 0.2968 0.8228 1.6565
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.96989 1.59115 1.238 0.21571
Gender 1.10799 0.75922 1.459 0.14446
Age -0.07221 0.03441 -2.098 0.03586 *
DrugA 2.98290 1.09693 2.719 0.00654 **
DrugB 0.34432 0.84455 0.408 0.68350
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 61.827 on 44 degrees of freedom
Residual deviance: 43.779 on 40 degrees of freedom
AIC: 53.779
Number of Fisher Scoring iterations: 5
2.5 % 97.5 %
(Intercept) -1.0468032 5.34676953
Gender -0.3436048 2.68635060
Age -0.1484526 -0.01007708
DrugA 1.0484500 5.47015945
DrugB -1.3282170 2.04585735
(Intercept) Gender Age DrugA DrugB
7.1699028 3.0282637 0.9303383 19.7450880 1.4110286
OR 2.5 % 97.5 %
(Intercept) 7.1699028 0.3510582 209.9290309
Gender 3.0282637 0.7092092 14.6780122
Age 0.9303383 0.8620409 0.9899735
DrugA 19.7450880 2.8532251 237.4980585
DrugB 1.4110286 0.2649492 7.7357880
Minitab: I’m like 95% Confident I Could Train A Chimp To Do This
SPSS Application: Syntax
LOGISTIC REGRESSION VARIABLES relief /METHOD=ENTER gender age drugA drugB /PRINT=GOODFIT CI(95) /CRITERIA=PIN(0.05) POUT(0.10) ITERATE(20) CUT(0.5).
Relevant SPSS Output
Model Summary
Step -2 Log likelihood Cox & Snell R
Square
Nagelkerke R
Square
1 43.779a .330 .442
Hosmer and Lemeshow Test
Step Chi-square df Sig.
1 5.172 7 .639
Variables in the Equation
B S.E. Wald df Sig. Exp(B) 95% C.I.for EXP(B)
Lower Upper
Step 1a
gender 1.108 .759 2.130 1 .144 3.028 .684 13.410
age -.072 .034 4.404 1 .036 .930 .870 .995
drugA 2.983 1.097 7.395 1 .007 19.745 2.300 169.501
drugB .344 .845 .166 1 .683 1.411 .270 7.386
Constant 1.970 1.591 1.533 1 .216 7.170
Multinomial Logistic Regression
A natural extension of the binary logistic regression is when the outcome variable is categorical assuming more than two values, e.g., 0, 1, or 2. This model is called a multinomial logistic regression model.
Two models are distinguished: for ordinal outcome (ordered categories such as size)and for nominal outcome (unordered categories, such as race).
Cumulative Logit Model for Ordinal Outcome
For example, if , the cumulative probabilities are
and
)( jyP 4m
),3()2()1()3(
),2()1()2( ),1()1(
yPyPyPyP
yPyPyPyPyP
.1)4()3()2()1()4( yPyPyPyPyP
Cumulative Logit Model for Ordinal Outcome
Define the odds of outcome in category j or below as the ratio These are termed cumulative odds.
Define the logits of the cumulative probabilities (called cumulative logits) by
.)(
)(
jyP
jyP
.)(
)(ln)(logit
jyP
jyPjyP
Cumulative Logit Model for Ordinal Outcome
For instance, if , the cumulative logits are:
and
Since , the logit is not defined.
4m
,)4()3()2(
)1(ln
)1(
)1(ln )1(logit
yPyPyP
yP
yP
yPyP
,)4()3(
)2()1(ln
)2(
)2(ln)2(logit
yPy
yPyP
yP
yPyP
.)4(
)3()2()1(ln
)3(
)3(ln)3(logit
yP
yPyPyP
yP
yPyP
1)4( yP
Cumulative Logit Model for Ordinal Outcome
The cumulative logit model for an ordinal outcome and predictors has the form
Note that this model requires a separate intercept parameter for each cumulative probability. SAS uses this model.
Note that some software packages (in particular, SPSS) use the model
y
kxx ,...,1
.1,...,1 ,...)(logit 11 mjxxjyP kkj
.1,...,1 ),...()(logit 11 mjxxjyP kkj
Goodness of Model Fit
There are only two quantities that may be used to check the model fit. They are: Pseudo R-square and Max-rescaled R-square.
We cannot perform the Hosmer-Lemeshow goodness-of-fit test in multinomial logistic regression.
Interpretation of Beta Coefficients
When is continuous, then the quantity represents the estimated percent change in cumulative odds when is increased by one unit, and the other predictors are held fixed.
If is a categorical variable with several levels, then represents the estimated percent ratio of cumulative odds for the level and that for the reference level, controlling for the other predictors.
1x %100)1)ˆ(( 1 Exp
1x
1x%100)ˆ( 1 Exp
11 x
Examples of Ordinal Outcomes
Example 1. A marketing research firm wants to investigate what factors influence the size of soda (small, medium, large or extra large) that people order at a fast-food chain.
Example 2. A researcher is interested in what factors influence medaling in Olympic swimming (gold, silver, bronze).
Example 3. A study looks at factors that influence the decision of whether to apply to graduate school. College juniors are asked if they are unlikely, somewhat likely, or very likely to apply to graduate school.
Numeric Example
Among variables collected by California Health Institute Survey (CHIS) there were demographic variables: gender (M/F) age (in years) marital status (Married/Not Married) highest educational degree obtained (<HS/Hsgrad/HS+)and health condition (Poor/Fair/Good/Excellent)
The following SAS code runs a cumulative logit model for the ordinal outcome variable health for the data on 32 respondents.
SAS Application: Code
data CHIS; input gender$ age marital$ educ$ health$ @@;datalines;M 46 yes 1 3 M 62 yes 1 1 M 52 yes 2 4 M 50 no 1 2 F 44 no 3 1F 68 no 2 2 F 50 no 3 2 F 93 no 1 1 M 60 yes 2 4 M 88 no 3 3M 58 yes 2 4 M 62 yes 2 3 F 64 yes 3 3 F 49 yes 2 3 F 71 yes 3 4M 32 no 3 3 F 88 no 2 1 F 36 yes 3 4 M 85 no 3 3 F 38 no 3 2M 49 yes 3 4 F 43 no 1 3 M 61 yes 2 3 M 47 yes 3 4 F 36 yes 1 3M 44 yes 1 4 M 41 no 2 3 M 55 yes 1 3 M 37 no 3 2 M 58 yes 2 4F 40 yes 2 3 F 97 no 2 1;proc format;value $maritalfmt 'yes'='married' 'no'='not married'; value $educfmt '1'='<HS' '2'='HSgrad' '3'='HS+';value $healthfmt '1'='poor' '2'='fair' '3'='good' '4'='excellent'; run;
proc logistic;class gender (ref='M') marital (ref='yes') educ (ref='3')/param=ref; model health=gender age marital educ/link=clogit rsq; run;
The Important Features of the SAS Code
Ordinal variables should be entered into SAS as numbers 1, 2, etc. Otherwise SAS orders them alphabetically.
Option link=clogit specifies the cumulative logit link function. Note that by default, link=logit.
Option lackfit cannot be specified because the Hosmer-Lemeshow goodness-of-fit test cannot be performed in the case of multinomial logistic regression.
Relevant SAS OutputR-Square 0.5988 Max-rescaled R-Square 0.6466
Analysis of Maximum Likelihood Estimates
Parameter DF Estimate Standard Error
Wald Chi-Square
Pr > ChiSq
Intercept 1 1 -8.3785 2.2137 14.3258 0.0002
Intercept 2 1 -6.6506 1.9151 12.0597 0.0005
Intercept 3 1 -2.9271 1.5069 3.7731 0.0521
gender F 1 1.8504 0.8187 5.1082 0.0238
age 1 0.0251 0.0234 1.1540 0.2827
marital no 1 4.1511 1.2304 11.3819 0.0007
educ 1 1 2.2937 1.0633 4.6532 0.0310
educ 2 1 0.9264 0.9206 1.0125 0.3143
Odds Ratio Estimates
Effect Point Estimate 95% Wald Confidence Limits
gender F vs M 6.363 1.279 31.662
age 1.025 0.980 1.074
marital no vs yes 63.501 5.694 708.125
educ 1 vs 3 9.912 1.233 79.660
educ 2 vs 3 2.525 0.416 15.343
Results
Gender, marital status and education are associated with health status. Age is not.
This model has a reasonably good fit because the pseudo R-square and max-rescaled R-square are pretty large.
Results
The fitted model is:
. 9264.0 2937.2
4.1511 0251.0 8504.13785.8
) (ˆ) (ˆ
ln) (ˆlogit
'HSgrad'HS''
marriednotagefemale
nt healthor excellegood,fair,P
healthpoorPhealthpoorP
. 9264.0 2937.2
4.1511 0251.0 8504.16505.6
) (ˆ)(ˆ
ln)(ˆlogit
'HSgrad'HS''
marriednotagefemale
nt healthor excellegood,P
ir healthpoor or faPir healthpoor or faP
. 9264.0 2937.2
4.1511 0251.0 8504.19271.2
)(ˆ),(ˆ
ln),(ˆlogit
'HSgrad'HS''
marriednotagefemale
healthexcellent P
ood healthfair, or gpoorPood healthfair, or gpoorP
Interpretation of Beta Coefficients
The estimated odds of worse health for females are 6.363 times those for males (or 636.6%).
As age increases by one year, the estimated odds of worse health increase by 2.5%=(1.025-1)100% (not significant).
The estimated odds of worse health for not married people are 63.501 times those for married (or 6,350.1%).
The estimated odds of worse health for <HS are 9.912 times those for HS+ (or 991.2%).
The estimated odds of worse health for HSgrad are 2.525 times those for HS+ (or 252.2%) (not significant).
These ratios apply to all of the three cumulative probabilities P(poor health), P(poor or fair health) and P(poor, fair, or good health).
R Application: CodeGender<-c(1,0,0,0,0,1,0,0,0,1,1,1,1,0,0,1,1,0,1,0,1,1,0,1,1,1,0,0,1,1,1,1)
Age<-c(62,44,93,88,97,50,68,50,38,37,46,88,62,64,49,32,85,43,61,36,41,55,40,52,60,58,71,36,49,47,44,58)
Marital<-c(1,0,0,0,0,0,0,0,0,0,1,0,1,1,1,0,0,0,1,1,0,1,1,1,1,1,1,1,1,1,1,1)
Educ<-c(1,3,1,2,2,1,2,3,3,3,1,3,2,3,2,3,3,1,2,1,2,1,2,2,2,2,3,3,3,3,1,2)
Health<-c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4)
require(foreign)
require(ggplot2)
require(MASS)
require(Hmisc)
require(reshape2)
Health<-factor(Health)
logr.healthlevel <- polr(Health ~ Gender + Age + Marital + Educ, Hess=TRUE)
summary(logr.healthlevel)
ctable<-coef(summary(logr.healthlevel))
p <- pnorm(abs(ctable[, "t value"]), lower.tail = FALSE) * 2
(ctable <- cbind(ctable, "p value" = p))
ci <- confint(logr.healthlevel)
exp(cbind(OR = coef(logr.healthlevel), ci))
Relevant R OutputCoefficients:
Value Std. Error t value
Gender 1.84490 0.81397 2.267
Age -0.02338 0.02234 -1.046
Marital 4.19114 1.24772 3.359
Educ 1.12441 0.53246 2.112
Intercepts:
Value Std. Error t value
1|2 1.0409 1.9805 0.5256
2|3 2.7789 2.0141 1.3797
3|4 6.4879 2.4298 2.6702
Residual Deviance: 54.22504
AIC: 68.22504
Value Std. Error t value p value
Gender 1.84489537 0.81396860 2.2665437 0.0234181163
Age -0.02337853 0.02233984 -1.0464952 0.2953324826
Marital 4.19113850 1.24772220 3.3590318 0.0007821608
Educ 1.12440869 0.53245920 2.1117274 0.0347098361
1|2 1.04088922 1.98053493 0.5255596 0.5991942069
2|3 2.77894049 2.01410976 1.3797364 0.1676678305
3|4 6.48793684 2.42978798 2.6701658 0.0075813794
OR 2.5 % 97.5 %
Gender 6.3274377 1.3836035 35.292605
Age 0.9768926 0.9323719 1.019613
Marital 66.0980005 8.1282254 1546.311706
Educ 3.0783960 1.1478311 9.538690
Minitab: I’m like 95% Confident I Could Train A Chimp To Do This
SPSS Application: Syntax
PLUM health BY gender marital educ WITH age/LINK=LOGIT
Relevant SPSS Output
Pseudo R-Square
Cox and Snell .599
Nagelkerke .647
McFadden .351
Parameter Estimates
Estimate Std. Error Wald df Sig. 95% Confidence Interval
Lower Bound Upper Bound
Threshold
[health = 1.00] -8.379 2.214 14.326 1 .000 -12.717 -4.040
[health = 2.00] -6.651 1.915 12.060 1 .001 -10.404 -2.897
[health = 3.00] -2.927 1.507 3.773 1 .052 -5.881 .026
Location
age -.025 .023 1.154 1 .283 -.071 .021
[gender=.00] -1.850 .819 5.108 1 .024 -3.455 -.246
[gender=1.00] 0a . . 0 . . .
[marital=.00] -4.151 1.230 11.382 1 .001 -6.563 -1.740
[marital=1.00] 0a . . 0 . . .
[educ=1.00] -2.294 1.063 4.653 1 .031 -4.378 -.210
[educ=2.00] -.926 .921 1.013 1 .314 -2.731 .878
[educ=3.00] 0a . . 0 . . .
Generalized Logit Model for Nominal Outcome
• Suppose is a nominal outcome with levels, and assume that the mth level is the reference.
• Define the generalized logit function as
For example if , and
y m
.1,...,1 ere wh)(
)(ln)(logit
mjmyP
jyPjyP
4m ,)4(
)2(ln)2(logit ,
)4(
)1(ln)1(logit
yP
yPyP
yP
yPyP
.)4(
)3(ln)3(logit
yP
yPyP
Generalized Logit Model for Nominal Outcome
The generalized logit model for nominal outcomewith levels, and response variables has the form
Note that ALL the regression coefficients differ for different j’s.
.1,...,1 ,...)(logit 11 mjxxjyP kjkjj
Interpretation of Beta Coefficients
When is continuous, then the quantity represents the estimated percent change in odds in favorof as opposed to when is increased by oneunit, and the other predictors are held fixed.
If is a categorical variable with several levels, then represents the estimated percent ratio of odds in favor of as opposed to for the level and that for the reference level, controlling for the other predictors.
1x %100)1)ˆ(( 1 jExp
jy my 1x
1x%100)ˆ( 1 jExp jy my 11 x
Examples of Nominal Outcomes
Example 1. People's occupational choices might be influenced by their parents' occupations and their own education level. We can study the relationship of one's occupation choice with education level and father's occupation.
Example 3. Entering high school students make program choices among general program, vocational program and academic program. Their choice might be modeled using their writing score and their social economic status.
Numeric Example
Over the course of a school year, third-graders from three different schools are exposed to three different styles of mathematics instruction: a self-paced computer-learning style, a team approach, and a traditional class approach. The students are asked which style they prefer, and their responses, classified by the type of program they are in (a regular school day versus a regular school day supplemented with an afternoon school program), are recorded.
The following SAS code runs a generalized logit model for the nominal outcome variable style (self/team/class).
SAS Application: Code
data school; length program$ 9;input school program$ style$ count @@; datalines; 1 regular self 10 1 regular team 17 1 regular class 26 1 afternoon self 5 1 afternoon team 12 1 afternoon class 50 2 regular self 21 2 regular team 17 2 regular class 26 2 afternoon self 16 2 afternoon team 12 2 afternoon class 36 3 regular self 15 3 regular team 15 3 regular class 16 3 afternoon self 12 3 afternoon team 12 3 afternoon class 20 ;
proc logistic; freq count; class school(ref='1') program(ref='afternoon')/param=ref; model style(order=data)=school program/link=glogit rsq; run;
The Important Features of the SAS Code
The data set contains frequencies of identical observations. The freq clause has to be used in proc logistic.
The option order=data prescribes SAS to use the last mentioned in the data set value of the outcome variable as the reference.
Option link=glogit specifies the generalized logit link function.
Option lackfit cannot be specified because the Hosmer-Lemeshow goodness-of-fit test cannot be performed in the case of multinomial logistic regression.
Relevant SAS Output
R-Square 0.0808 Max-rescaled R-Square 0.0926
Analysis of Maximum Likelihood Estimates
Parameter style DF Estimate Standard Error
Wald Chi-Square
Pr > ChiSq
Intercept self 1 -1.9707 0.3204 37.8418 <.0001
Intercept team 1 -1.3088 0.2596 25.4174 <.0001
school 2 self 1 1.0828 0.3539 9.3598 0.0022
school 2 team 1 0.1801 0.3172 0.3224 0.5702
school 3 self 1 1.3147 0.3839 11.7262 0.0006
school 3 team 1 0.6556 0.3395 3.7296 0.0535
program regular self 1 0.7474 0.2820 7.0272 0.0080
program regular team 1 0.7426 0.2706 7.5332 0.0061
Relevant SAS Output
Odds Ratio Estimates
Effect style Point Estimate 95% Wald Confidence Limits
school 2 vs 1 self 2.953 1.476 5.909
school 2 vs 1 team 1.197 0.643 2.230
school 3 vs 1 self 3.724 1.755 7.902
school 3 vs 1 team 1.926 0.990 3.747
program regular vs afternoon self 2.112 1.215 3.670
program regular vs afternoon team 2.101 1.237 3.571
Results
This model doesn’t have a very good fit, because both R-square and max-rescaled R-square are pretty small.The fitted model is:
and
, 7474.03 3147.12 0828.19707.1)(logit regularschoolschoolselfP
. 7426.03 6556.02 1801.03088.1)(logit regularschoolschoolteamP
Interpretation of Beta Coefficients
The estimated odds of preferring a self-paced computer-learning style as opposed to a traditional class approach in school 2 is 2.953 times those in school 1 (or 295.3%).
The estimated odds of preferring a self-paced computer-learning style as opposed to a traditional class approach in school 3 is 3.724 times those in school 1 (or 372.4%).
The estimated odds of preferring a self-paced computer-learning style as opposed to a traditional class approach in regular program is 2.112 times those in afternoon program (or 211.2%).
Interpretation of Beta Coefficients
The estimated odds of preferring a team learning approach as opposed to a traditional class approach in school 2 is 1.197 times those in school 1 (or 119.7%).
The estimated odds of preferring a team learning approach as opposed to a traditional class approach in school 3 is 1.926 times those in school 1 (or 192.6%).
The estimated odds of preferring a team learning approach as opposed to a traditional class approach in regular program is 2.101 times those in afternoon program (or 210.1%).
Minitab: I’m like 95% Confident I Could Train A Chimp To Do This
R Application: Code
SPSS Application: Syntax
Only numeric values are allowed as SPSS data.
Schools were renumbered (1->3, 2->2, 3->1) to make school 1 the reference. DATASET NAME DataSet1 WINDOW=FRONT.
NOMREG style (BASE=LAST ORDER=ASCENDING) BY school program
/CRITERIA CIN(95) DELTA(0) MXITER(100) MXSTEP(5) CHKSEP(20) LCONVERGE(0) PCONVERGE(0.000001) SINGULAR(0.00000001)
/MODEL
/STEPWISE=PIN(.05) POUT(0.1) MINEFFECT(0) RULE(SINGLE) ENTRYMETHOD(LR) REMOVALMETHOD(LR)
/INTERCEPT=INCLUDE
/PRINT=PARAMETER SUMMARY LRT CPS STEP MFI.
Relevant SPSS Output
Pseudo R-Square
Cox and Snell .081
Nagelkerke .093
McFadden .041
Parameter Estimates
stylea B Std. Error Wald df Sig. Exp(B) 95% Confidence Interval for Exp(B)
Lower Bound Upper Bound
self
Intercept -1.971 .320 37.842 1 .000
[school=1.00] 1.315 .384 11.727 1 .001 3.724 1.755 7.903
[school=2.00] 1.083 .354 9.360 1 .002 2.953 1.476 5.909
[school=3.00] 0b . . 0 . . . .
[program=1.00] .747 .282 7.027 1 .008 2.112 1.215 3.670
[program=2.00] 0b . . 0 . . . .
team
Intercept -1.309 .260 25.418 1 .000
[school=1.00] .656 .339 3.730 1 .053 1.926 .990 3.747
[school=2.00] .180 .317 .322 1 .570 1.197 .643 2.230
[school=3.00] 0b . . 0 . . . .
[program=1.00] .743 .271 7.533 1 .006 2.101 1.237 3.571
[program=2.00] 0b . . 0 . . . .