12
Logistic (regression) single and multiple

Logistic (regression) single and multiple. Overview Defined: A model for predicting one variable from other variable(s). Variables:IV(s) is continuous/categorical,

Embed Size (px)

Citation preview

Page 1: Logistic (regression) single and multiple. Overview  Defined: A model for predicting one variable from other variable(s).  Variables:IV(s) is continuous/categorical,

Logistic (regression)single and multiple

Page 2: Logistic (regression) single and multiple. Overview  Defined: A model for predicting one variable from other variable(s).  Variables:IV(s) is continuous/categorical,

Overview Defined: A model for predicting one variable from

other variable(s).

Variables: IV(s) is continuous/categorical, DV is dichotomous

Relationship: Prediction of group membership

Example: Can we predict bar passage from LSAT score (and/or GPA, etc)

Assumptions: Multicollinearity (not linearity or normality)

Page 3: Logistic (regression) single and multiple. Overview  Defined: A model for predicting one variable from other variable(s).  Variables:IV(s) is continuous/categorical,

Comparison to Linear Regression: Since dichotomous outcome,

can’t use linear regression because not linear

Since dichotomous outcome, we are now talking about “probabilities” (of 0 or 1)

So logistic is about predicting the probability of the outcome occurring.

Page 4: Logistic (regression) single and multiple. Overview  Defined: A model for predicting one variable from other variable(s).  Variables:IV(s) is continuous/categorical,

Comparison to Linear Regression: Logistic is based upon “odds ratio”

which is the probability of an event divided by probability of non-event.

For example, if Exp(b) =2, then a one unit change would make the event twice as likely (.67/.33) to occur.

predictorthe in change unit a beforeOdds predictorthe in change unit a afterOdds bExp )( predictorthe in change unit a beforeOdds predictorthe in change unit a afterOdds bExp )(

Page 5: Logistic (regression) single and multiple. Overview  Defined: A model for predicting one variable from other variable(s).  Variables:IV(s) is continuous/categorical,

Comparison to Linear Regression: Single predictor

Multiple predictor

Notice the linear regression equation e is the base of the natural logarithm (about 2.718)

)110(11)(

iXbbeYP

)110(11)(

iXbbeYP

)...22110(11)(

inXnbXbXbbeYP

)...22110(11)(

inXnbXbXbbeYP

Page 6: Logistic (regression) single and multiple. Overview  Defined: A model for predicting one variable from other variable(s).  Variables:IV(s) is continuous/categorical,

Comparison to Linear Regression: Linear = measure of fit was sum of squares

Summing the squared difference between the line and actual outcomes

Logistic = measure of fit is log-likelihood Summing the probabilities associated with the predicted and

actual outcomes

Page 7: Logistic (regression) single and multiple. Overview  Defined: A model for predicting one variable from other variable(s).  Variables:IV(s) is continuous/categorical,

Comparison to Linear Regression: Linear = overall variance explained by R2

Logistic = overall “variance explained” by… -2LL (log-likelihood score x 2, higher means worse fit) R2

cs (Cox and Snell’s statistic for comparison to baseline)

R2n (Nagelkerke’s statistic variation of R2

cs)

Page 8: Logistic (regression) single and multiple. Overview  Defined: A model for predicting one variable from other variable(s).  Variables:IV(s) is continuous/categorical,

NOTE: There is no direct analog of R2 in logistic analysis.

This is because an R2 measure seeks to make a statement about the "percent of variance explained," but the variance of a dichotomous or categorical dependent variable depends on the frequency distribution of that variable.

For a dichotomous dependent variable, for instance, variance is at a maximum for a 50-50 split, and the more lopsided the split, the lower the variance.

This means that R2 measures for logistic analysis with differing marginal distributions of their respective dependent variables cannot be compared directly, and comparison of logistic R2 measures with R2 from OLS regression is also problematic.

Nonetheless, a number of logistic “pseudo” R2 measures have been proposed, all of which should be reported as approximations to OLS R2, BUT NOT as actual percent of variance explained.

Page 9: Logistic (regression) single and multiple. Overview  Defined: A model for predicting one variable from other variable(s).  Variables:IV(s) is continuous/categorical,

Comparison to Linear Regression: Linear = unique contributions of variable by...

unstandardized b (for the regression equation) standardized b (for interpretation, similar to r) significance level (t-test)

Logistic = unique contributions of variable by... unstandardized b (for the logistic equation) exp(b) (for interpretation, as odds ratio) significance level (Wald, using chi-square test)

bSEbWald bSE

bWald

Page 10: Logistic (regression) single and multiple. Overview  Defined: A model for predicting one variable from other variable(s).  Variables:IV(s) is continuous/categorical,

Comparison to Linear Regression:Logistic = unique contributions of variable by...

unstandardized b (for the logistic equation) exp(b) (for interpretation, as odds ratio) significance level (Wald, using chi-square test)

(1) Both gre and gpa are significant predictors while topnotch is not.

(2) For a one unit increase in gpa, the log odds of being admitted to graduate school (vs. not being admitted) increases by .668.

(3) For a one unit increase in gpa, the odds of being admitted to graduate school (vs. not being admitted) increased by a factor of 1.949.

Page 11: Logistic (regression) single and multiple. Overview  Defined: A model for predicting one variable from other variable(s).  Variables:IV(s) is continuous/categorical,

Comparison to Linear Regression: Linear = each variable (without controlling)…

Bivariate correlation

Logistic = each variable (without controlling)… Logistic output shows you the following information:

Page 12: Logistic (regression) single and multiple. Overview  Defined: A model for predicting one variable from other variable(s).  Variables:IV(s) is continuous/categorical,

Comparison to Linear Regression: Linear = different methods…

Entry Hierarchical Stepwise

Logistic = different methods… Entry (same as with linear regression) Hierarchical (same as with linear regression) Stepwise (see Field’s textbook page 226)