Upload
sherilyn-sutton
View
213
Download
1
Embed Size (px)
Citation preview
Logistic (regression)single and multiple
Overview Defined: A model for predicting one variable from
other variable(s).
Variables: IV(s) is continuous/categorical, DV is dichotomous
Relationship: Prediction of group membership
Example: Can we predict bar passage from LSAT score (and/or GPA, etc)
Assumptions: Multicollinearity (not linearity or normality)
Comparison to Linear Regression: Since dichotomous outcome,
can’t use linear regression because not linear
Since dichotomous outcome, we are now talking about “probabilities” (of 0 or 1)
So logistic is about predicting the probability of the outcome occurring.
Comparison to Linear Regression: Logistic is based upon “odds ratio”
which is the probability of an event divided by probability of non-event.
For example, if Exp(b) =2, then a one unit change would make the event twice as likely (.67/.33) to occur.
predictorthe in change unit a beforeOdds predictorthe in change unit a afterOdds bExp )( predictorthe in change unit a beforeOdds predictorthe in change unit a afterOdds bExp )(
Comparison to Linear Regression: Single predictor
Multiple predictor
Notice the linear regression equation e is the base of the natural logarithm (about 2.718)
)110(11)(
iXbbeYP
)110(11)(
iXbbeYP
)...22110(11)(
inXnbXbXbbeYP
)...22110(11)(
inXnbXbXbbeYP
Comparison to Linear Regression: Linear = measure of fit was sum of squares
Summing the squared difference between the line and actual outcomes
Logistic = measure of fit is log-likelihood Summing the probabilities associated with the predicted and
actual outcomes
Comparison to Linear Regression: Linear = overall variance explained by R2
Logistic = overall “variance explained” by… -2LL (log-likelihood score x 2, higher means worse fit) R2
cs (Cox and Snell’s statistic for comparison to baseline)
R2n (Nagelkerke’s statistic variation of R2
cs)
NOTE: There is no direct analog of R2 in logistic analysis.
This is because an R2 measure seeks to make a statement about the "percent of variance explained," but the variance of a dichotomous or categorical dependent variable depends on the frequency distribution of that variable.
For a dichotomous dependent variable, for instance, variance is at a maximum for a 50-50 split, and the more lopsided the split, the lower the variance.
This means that R2 measures for logistic analysis with differing marginal distributions of their respective dependent variables cannot be compared directly, and comparison of logistic R2 measures with R2 from OLS regression is also problematic.
Nonetheless, a number of logistic “pseudo” R2 measures have been proposed, all of which should be reported as approximations to OLS R2, BUT NOT as actual percent of variance explained.
Comparison to Linear Regression: Linear = unique contributions of variable by...
unstandardized b (for the regression equation) standardized b (for interpretation, similar to r) significance level (t-test)
Logistic = unique contributions of variable by... unstandardized b (for the logistic equation) exp(b) (for interpretation, as odds ratio) significance level (Wald, using chi-square test)
bSEbWald bSE
bWald
Comparison to Linear Regression:Logistic = unique contributions of variable by...
unstandardized b (for the logistic equation) exp(b) (for interpretation, as odds ratio) significance level (Wald, using chi-square test)
(1) Both gre and gpa are significant predictors while topnotch is not.
(2) For a one unit increase in gpa, the log odds of being admitted to graduate school (vs. not being admitted) increases by .668.
(3) For a one unit increase in gpa, the odds of being admitted to graduate school (vs. not being admitted) increased by a factor of 1.949.
Comparison to Linear Regression: Linear = each variable (without controlling)…
Bivariate correlation
Logistic = each variable (without controlling)… Logistic output shows you the following information:
Comparison to Linear Regression: Linear = different methods…
Entry Hierarchical Stepwise
Logistic = different methods… Entry (same as with linear regression) Hierarchical (same as with linear regression) Stepwise (see Field’s textbook page 226)