26
© Willett, Harvard University Graduate School of Education, 06/16/22 S052/II.2(b) – Slide 1 S052/II.2(b): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area? More details can be found in the Course Objectives and Content” handout on the course webpage. Multiple Regression Analysis (MRA) i i i i X X Y 2 2 1 1 0 Do your residuals meet the required assumptions? Test for residual normalit y Use influence statistics to detect atypical datapoints If your residuals are not independent, replace OLS by GLS regression analysis Use Individual growth modeling Specify a Multi-level Model If your sole predictor is continuous, MRA is identical to correlational analysis If your sole predictor is dichotomous, MRA is identical to a t-test If your several predictors are categorical, MRA is identical to ANOVA If time is a predictor, you need discrete- time survival analysisIf your outcome is categorical, you need to use… Binomial logistic regression analysis (dichotom ous outcome) Multinomia l logistic regression analysis (polytomo us outcome) Discrimina nt Analysis If you have more predictors than you can deal with, Create taxonomies of fitted models and compare them. Form composites of the indicators of any common construct. Conduct a Principal Components Analysis Use Cluster Analysis Transform the outcome or predictor Use non- linear regression analysis. If your outcome vs. predictor relationship is non-linear, Today’s Topic Area

© Willett, Harvard University Graduate School of Education, 8/4/2015S052/II.2(b) – Slide 1 S052/II.2(b): Applied Data Analysis Roadmap of the Course –

Embed Size (px)

Citation preview

Page 1: © Willett, Harvard University Graduate School of Education, 8/4/2015S052/II.2(b) – Slide 1 S052/II.2(b): Applied Data Analysis Roadmap of the Course –

© Willett, Harvard University Graduate School of Education, 04/19/23 S052/II.2(b) – Slide 1

S052/II.2(b): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

S052/II.2(b): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

More details can be found in the “Course Objectives and Content” handout on the course webpage.More details can be found in the “Course Objectives and Content” handout on the course webpage.

Multiple RegressionAnalysis (MRA)

Multiple RegressionAnalysis (MRA) iiii XXY 22110

Do your residuals meet the required assumptions?

Test for residual

normality

Use influence statistics to

detect atypical datapoints

If your residuals are not independent,

replace OLS by GLS regression analysis

Use Individual

growth modeling

Specify a Multi-level

Model

If your sole predictor is continuous, MRA is

identical to correlational analysis

If your sole predictor is dichotomous, MRA is identical to a t-test

If your several predictors are

categorical, MRA is identical to ANOVA

If time is a predictor, you need discrete-

time survival analysis…

If your outcome is categorical, you need to

use…

Binomial logistic

regression analysis

(dichotomous outcome)

Multinomial logistic

regression analysis

(polytomous outcome)

Discriminant Analysis

If you have more predictors than you

can deal with,

Create taxonomies of fitted models and compare

them.Form composites of the indicators of any common

construct.

Conduct a Principal Components Analysis

Use Cluster Analysis

Transform the outcome or predictor

Use non-linear regression analysis.

If your outcome vs. predictor relationship

is non-linear,

Today’s Topic Area

Page 2: © Willett, Harvard University Graduate School of Education, 8/4/2015S052/II.2(b) – Slide 1 S052/II.2(b): Applied Data Analysis Roadmap of the Course –

© Willett, Harvard University Graduate School of Education, 04/19/23 S052/II.2(b) – Slide 2

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis Printed Syllabus – What Is Today’s Topic?

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis Printed Syllabus – What Is Today’s Topic?

Please check inter-connections among the Roadmap, the Daily Topic Area, the Printed Syllabus, and the content of today’s class when you pre-read the day’s materials.

Please check inter-connections among the Roadmap, the Daily Topic Area, the Printed Syllabus, and the content of today’s class when you pre-read the day’s materials.

Today, I will: Introduce the multinomial logistic

regression model, distinguishing it from the binomial logistic regression model.

Fit a taxonomy of multinomial logistic regression models.

Compare and contrast the output obtained in a multinomial and a binomial logit analysis.

Explain an additional test (“Type III Analysis of Effects”) that is available in a multinomial logit analysis.

Test and interpret a fitted multinomial logistic regression model.

Today, I will: Introduce the multinomial logistic

regression model, distinguishing it from the binomial logistic regression model.

Fit a taxonomy of multinomial logistic regression models.

Compare and contrast the output obtained in a multinomial and a binomial logit analysis.

Explain an additional test (“Type III Analysis of Effects”) that is available in a multinomial logit analysis.

Test and interpret a fitted multinomial logistic regression model.

Page 3: © Willett, Harvard University Graduate School of Education, 8/4/2015S052/II.2(b) – Slide 1 S052/II.2(b): Applied Data Analysis Roadmap of the Course –

© Willett, Harvard University Graduate School of Education, 04/19/23 S052/II.2(b) – Slide 3

Dataset ALT_RTS_GIRLS.txt

Overview Sub-sample of girls from the NELS:88 dataset in which girls are tracked as they exit from high school (by dropout, receipt of GED, or graduation) and enter college (no college, community college, four-year college).

Source Kurlaender, M. (2003). Reinforcing Disadvantage or Increasing Opportunity: Alternative Routes to Educational Attainment. Unpublished Doctoral Thesis, Harvard University Graduate School of Education, in process. 

Sample size 5148 females

Info NELS:88 is a national longitudinal data-collection effort managed by the National Center for Education Statistics (NCES), providing longitudinal data on trends/transitions in young people’s lives as they develop, attend school, and embark on their careers. Data were collected from students, parents, teachers and high school principals, and from school records. The survey began with an 8th grade cohort in 1988. Cognitive tests (math, science, reading, history) were administered in the base year (1988), at first follow up (1990), and at second follow up (1992). Third and fourth follow up data were collected in 1994 and 2000. All school dropouts who could be located were retained in the study.

Information accompanying today’s dataset is in ALT_RTS_GIRLS_info.pdf ….Information accompanying today’s dataset is in ALT_RTS_GIRLS_info.pdf ….

Radcliffe, Class of ‘57

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis Introducing the Alternative Routes to Education Dataset

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis Introducing the Alternative Routes to Education Dataset

Broad Research Question:Broad Research Question:How is entry into college impacted

by race/ethnicity and socio-economic status?

Page 4: © Willett, Harvard University Graduate School of Education, 8/4/2015S052/II.2(b) – Slide 1 S052/II.2(b): Applied Data Analysis Roadmap of the Course –

© Willett, Harvard University Graduate School of Education, 04/19/23 S052/II.2(b) – Slide 4

Structure of Dataset

Col# Var Name Variable Description Variable Metric/Labels

1 ID Respondent’s NELS ID

2 COLLEGE Type of institution selected by the respondent for their initial postsecondary education

Polychotomous variable (1 = None, 2 = Tech/Voc or Community College, 3 = Four-Year College)

3 HSGRAD Respondent’s high school completion status Polychotomous variable (1 = HS Diploma,

2 = GED, 3 = Dropout)

4 READ Respondent’s performance on a standardized test of reading in 8th grade

Continuous variable (mean 53.22 and standard deviation 13.47, in the full sample)

5 MATH Respondent’s performance on a standardized test of math in 8th grade

Continuous variable (mean 53.40, and standard deviation 13.56, in the full sample)

6 WHITE Is the respondent Caucasian? Dichotomous: 0 = no; 1 = yes

7 BLACK Is the respondent African-American? Dichotomous: 0 = no; 1 = yes

8 LATINO Is the respondent Hispanic? Dichotomous: 0 = no; 1 = yes

9 SES Respondent’s composite socio-economic status in 8th grade (including: family income, parental education and occupation, household possessions).

Continuous variable (mean 2.92 and standard deviation .79 in the full sample)

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis Printed Syllabus – What Is Today’s Topic?

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis Printed Syllabus – What Is Today’s Topic?

Polychotomous categorical outcomeoutcomevariable

Principle question question predictorspredictors

Page 5: © Willett, Harvard University Graduate School of Education, 8/4/2015S052/II.2(b) – Slide 1 S052/II.2(b): Applied Data Analysis Roadmap of the Course –

© Willett, Harvard University Graduate School of Education, 04/19/23 S052/II.2(b) – Slide 5

To model the relationship between a polychotomous outcome, like COLLEGE (which has three categories – “Four-Year College,” “Community College,” & “No College”) and a predictor like SES, we use the same “logit” approach that we have already developed …

To model the relationship between a polychotomous outcome, like COLLEGE (which has three categories – “Four-Year College,” “Community College,” & “No College”) and a predictor like SES, we use the same “logit” approach that we have already developed …

SESeCOLLEGEProb

101

1????

We still use the logistic regression function, containing usual parameters and predictors, to represent right hand side of the model

We still use the logistic regression function, containing usual parameters and predictors, to represent right hand side of the model

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis How Do You Model The Relationship Between A Polytomous Outcome & Predictors?

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis How Do You Model The Relationship Between A Polytomous Outcome & Predictors?

But, because the outcome is no longer a dichotomy, we have to do something about the left-hand side

of the model.

But, because the outcome is no longer a dichotomy, we have to do something about the left-hand side

of the model.

Under the multinomial logit approach, we simultaneously model the relationship between predictors and two outcome probabilities: Probability of going to

community college vs. not going to college,

Probability of going to 4-year college vs. not going to college.

0

1

1

P(4-Year vs. No Coll)

P(Comm Coll vs. No Coll)

SES

Page 6: © Willett, Harvard University Graduate School of Education, 8/4/2015S052/II.2(b) – Slide 1 S052/II.2(b): Applied Data Analysis Roadmap of the Course –

© Willett, Harvard University Graduate School of Education, 04/19/23 S052/II.2(b) – Slide 6

So, the hypothesized multinomial logit model simply becomes a simultaneous collection of two parts …So, the hypothesized multinomial logit model simply becomes a simultaneous collection of two parts …

Both parts of the new multinomial model are fitted

simultaneously to the data, with parameter estimates and goodness-of-fit statistics

interpreted in the usual way …

Both parts of the new multinomial model are fitted

simultaneously to the data, with parameter estimates and goodness-of-fit statistics

interpreted in the usual way …

SESe

CollegeNovsCollege

yearProb )3(

1

)3(

01

1.4

SESeCollegeNovsCollege

CommunityProb )2(1

)2(01

1.

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis How Do You Model The Relationship Between A Polytomous Outcome & Predictors?

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis How Do You Model The Relationship Between A Polytomous Outcome & Predictors?

Page 7: © Willett, Harvard University Graduate School of Education, 8/4/2015S052/II.2(b) – Slide 1 S052/II.2(b): Applied Data Analysis Roadmap of the Course –

© Willett, Harvard University Graduate School of Education, 04/19/23 S052/II.2(b) – Slide 7

*---------------------------------------------------------------------------------*Input the data, name and label the variables in the dataset*---------------------------------------------------------------------------------*; DATA ALT_RTS_GIRLS; INFILE 'C:\DATA\S052\ALT_RTS_GIRLS.txt'; INPUT ID COLLEGE HSGRAD READ MATH WHITE BLACK LATINO SES; LABEL COLLEGE = 'Institution Selected for Postsec Ed' BLACK = 'Is respondent African-American?' LATINO = 'Is respondent Hispanic?' WHITE = 'Is respondent Caucasian?‘ SES = 'Socio-economic status'; * Create a single categorical variable to represent race/ethnicity; IF BLACK=1 THEN RACE=1; IF LATINO=1 THEN RACE=2; IF WHITE=1 THEN RACE=3; * Create the required two-way interactions between RACE and SES; BxSES = BLACK*SES; LxSES = LATINO*SES; WxSES = WHITE*SES; PROC FORMAT; VALUE CFMT 1='No Postsec Ed' 2='Tech/Voc or Comm Coll' 3='4-Year College'; VALUE RFMT 1='Black' 2='Latino' 3='White';

*---------------------------------------------------------------------------------*Input the data, name and label the variables in the dataset*---------------------------------------------------------------------------------*; DATA ALT_RTS_GIRLS; INFILE 'C:\DATA\S052\ALT_RTS_GIRLS.txt'; INPUT ID COLLEGE HSGRAD READ MATH WHITE BLACK LATINO SES; LABEL COLLEGE = 'Institution Selected for Postsec Ed' BLACK = 'Is respondent African-American?' LATINO = 'Is respondent Hispanic?' WHITE = 'Is respondent Caucasian?‘ SES = 'Socio-economic status'; * Create a single categorical variable to represent race/ethnicity; IF BLACK=1 THEN RACE=1; IF LATINO=1 THEN RACE=2; IF WHITE=1 THEN RACE=3; * Create the required two-way interactions between RACE and SES; BxSES = BLACK*SES; LxSES = LATINO*SES; WxSES = WHITE*SES; PROC FORMAT; VALUE CFMT 1='No Postsec Ed' 2='Tech/Voc or Comm Coll' 3='4-Year College'; VALUE RFMT 1='Black' 2='Latino' 3='White';

Data-Analytic Handout II_2b_1 …Data-Analytic Handout II_2b_1 …

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis Reading The Alternative Routes Data Into PC-SAS

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis Reading The Alternative Routes Data Into PC-SAS

Standard input statementsStandard input statements

Creates a set of two-way interactions needed in subsequent

logistic regression analyses.

Creates a set of two-way interactions needed in subsequent

logistic regression analyses.

Create a categorical variable representing race/ethnicity for use

in subsequent tabulations

Create a categorical variable representing race/ethnicity for use

in subsequent tabulations

Format selected categorical variables for use in subsequent

tabulations

Format selected categorical variables for use in subsequent

tabulations

Page 8: © Willett, Harvard University Graduate School of Education, 8/4/2015S052/II.2(b) – Slide 1 S052/II.2(b): Applied Data Analysis Roadmap of the Course –

© Willett, Harvard University Graduate School of Education, 04/19/23 S052/II.2(b) – Slide 8

*---------------------------------------------------------------------------------*Obtaining statistics on COLLEGE choice for the different racial/ethnic groups*---------------------------------------------------------------------------------*;* Cross-tabulation of COLLEGE and RACE; PROC FREQ DATA=ALT_RTS_GIRLS; TABLE COLLEGE*RACE / NOROW CHISQ CELLCHI2; FORMAT RACE RFMT. COLLEGE CFMT.;

* Distribution of SES by COLLEGE and RACE; PROC TABULATE DATA=ALT_RTS_GIRLS; CLASS COLLEGE RACE; VAR SES; TABLE (COLLEGE*(RACE ALL)),(SES*(P5 MEDIAN P95)); FORMAT RACE RFMT. COLLEGE CFMT.;

*---------------------------------------------------------------------------------*Obtaining statistics on COLLEGE choice for the different racial/ethnic groups*---------------------------------------------------------------------------------*;* Cross-tabulation of COLLEGE and RACE; PROC FREQ DATA=ALT_RTS_GIRLS; TABLE COLLEGE*RACE / NOROW CHISQ CELLCHI2; FORMAT RACE RFMT. COLLEGE CFMT.;

* Distribution of SES by COLLEGE and RACE; PROC TABULATE DATA=ALT_RTS_GIRLS; CLASS COLLEGE RACE; VAR SES; TABLE (COLLEGE*(RACE ALL)),(SES*(P5 MEDIAN P95)); FORMAT RACE RFMT. COLLEGE CFMT.;

First, let’s conduct exploratory analyses to examine the bivariate relationships between the polychotomous outcome COLLEGE and predictors RACE and SES using classical contingency table analysis …First, let’s conduct exploratory analyses to examine the bivariate relationships between the polychotomous outcome COLLEGE and predictors RACE and SES using classical contingency table analysis …

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis Programming Exploratory Data Analysis In The Alternative Routes Dataset

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis Programming Exploratory Data Analysis In The Alternative Routes Dataset

Compute statistics by RACE, and for ALL

the sample.

Estimate the 5th %ile, median and 95th %ile of

SES for each group

Obtaining descriptive statistics on continuous

variable SES within each of the COLLEGE by RACE

subgroups (this is useful for subsequent plotting of

prototypical fitted trend lines).

Obtaining descriptive statistics on continuous

variable SES within each of the COLLEGE by RACE

subgroups (this is useful for subsequent plotting of

prototypical fitted trend lines).

Compute the 2 statistic

Compute the cell contributions to the 2 statisticEliminate row percentages

Standard two-way contingency-table analysis of

the bivariate relationship between categorical variables

COLLEGE and RACE.

Standard two-way contingency-table analysis of

the bivariate relationship between categorical variables

COLLEGE and RACE.

Page 9: © Willett, Harvard University Graduate School of Education, 8/4/2015S052/II.2(b) – Slide 1 S052/II.2(b): Applied Data Analysis Roadmap of the Course –

© Willett, Harvard University Graduate School of Education, 04/19/23 S052/II.2(b) – Slide 9

COLLEGE(Institution Selected for Postsec Ed)

RACEFrequency ‚Cell Chi-Square ‚Percent ‚Col Pct ‚Black ‚Latino ‚White ‚ TotalƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆNo Postsec Ed ‚ 120 ‚ 192 ‚ 716 ‚ 1028 ‚ 1.1294 ‚ 16.74 ‚ 4.6433 ‚ ‚ 2.21 ‚ 3.54 ‚ 13.22 ‚ 18.97 ‚ 20.91 ‚ 25.46 ‚ 17.51 ‚ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆTech/Voc or Comm ‚ 225 ‚ 365 ‚ 1467 ‚ 2057 Coll ‚ 0.2297 ‚ 21.656 ‚ 4.7421 ‚ ‚ 4.15 ‚ 6.74 ‚ 27.08 ‚ 37.97 ‚ 39.20 ‚ 48.41 ‚ 35.87 ‚ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ4-Year College ‚ 229 ‚ 197 ‚ 1907 ‚ 2333 ‚ 1.3351 ‚ 50.206 ‚ 12.077 ‚ ‚ 4.23 ‚ 3.64 ‚ 35.20 ‚ 43.06 ‚ 39.90 ‚ 26.13 ‚ 46.63 ‚ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆTotal 574 754 4090 5418 10.59 13.92 75.49 100.00   Statistics for Table of COLLEGE by RACE Statistic DF Value ProbƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒChi-Square 4 112.7585 <.0001Likelihood Ratio Chi-Square 4 117.5525 <.0001Mantel-Haenszel Chi-Square 1 41.2598 <.0001

COLLEGE(Institution Selected for Postsec Ed)

RACEFrequency ‚Cell Chi-Square ‚Percent ‚Col Pct ‚Black ‚Latino ‚White ‚ TotalƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆNo Postsec Ed ‚ 120 ‚ 192 ‚ 716 ‚ 1028 ‚ 1.1294 ‚ 16.74 ‚ 4.6433 ‚ ‚ 2.21 ‚ 3.54 ‚ 13.22 ‚ 18.97 ‚ 20.91 ‚ 25.46 ‚ 17.51 ‚ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆTech/Voc or Comm ‚ 225 ‚ 365 ‚ 1467 ‚ 2057 Coll ‚ 0.2297 ‚ 21.656 ‚ 4.7421 ‚ ‚ 4.15 ‚ 6.74 ‚ 27.08 ‚ 37.97 ‚ 39.20 ‚ 48.41 ‚ 35.87 ‚ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ4-Year College ‚ 229 ‚ 197 ‚ 1907 ‚ 2333 ‚ 1.3351 ‚ 50.206 ‚ 12.077 ‚ ‚ 4.23 ‚ 3.64 ‚ 35.20 ‚ 43.06 ‚ 39.90 ‚ 26.13 ‚ 46.63 ‚ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆTotal 574 754 4090 5418 10.59 13.92 75.49 100.00   Statistics for Table of COLLEGE by RACE Statistic DF Value ProbƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒChi-Square 4 112.7585 <.0001Likelihood Ratio Chi-Square 4 117.5525 <.0001Mantel-Haenszel Chi-Square 1 41.2598 <.0001

Standard test associated with a two-way contingency analysis:

1. The sample 2 statistic compares the observed frequencies to the frequencies expected under the null hypothesis.

2. The statistic is computed as follows:

Standard test associated with a two-way contingency analysis:

1. The sample 2 statistic compares the observed frequencies to the frequencies expected under the null hypothesis.

2. The statistic is computed as follows:

cellsall Exp

ExpObsStatistic Pearson

2

2

H0: COLLEGE and RACE are not related, in the population.

Test statistic & p-value: 2 = 112.75 ( p<.0001)

Decision: Reject H0

Conclusion: In the population, race/ethnicity is an important predictor of a girl’s choice of type of college.

H0: COLLEGE and RACE are not related, in the population.

Test statistic & p-value: 2 = 112.75 ( p<.0001)

Decision: Reject H0

Conclusion: In the population, race/ethnicity is an important predictor of a girl’s choice of type of college.

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis Classical Contingency Table Analysis In The Alternative Routes Dataset

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis Classical Contingency Table Analysis In The Alternative Routes Dataset

Page 10: © Willett, Harvard University Graduate School of Education, 8/4/2015S052/II.2(b) – Slide 1 S052/II.2(b): Applied Data Analysis Roadmap of the Course –

© Willett, Harvard University Graduate School of Education, 04/19/23 S052/II.2(b) – Slide 10

COLLEGE(Institution Selected for Postsec Ed)

RACEFrequency ‚Cell Chi-Square ‚Percent ‚Col Pct ‚Black ‚Latino ‚White ‚ TotalƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆNo Postsec Ed ‚ 120 ‚ 192 ‚ 716 ‚ 1028 ‚ 1.1294 ‚ 16.74 ‚ 4.6433 ‚ ‚ 2.21 ‚ 3.54 ‚ 13.22 ‚ 18.97 ‚ 20.91 ‚ 25.46 ‚ 17.51 ‚ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆTech/Voc or Comm ‚ 225 ‚ 365 ‚ 1467 ‚ 2057 Coll ‚ 0.2297 ‚ 21.656 ‚ 4.7421 ‚ ‚ 4.15 ‚ 6.74 ‚ 27.08 ‚ 37.97 ‚ 39.20 ‚ 48.41 ‚ 35.87 ‚ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ4-Year College ‚ 229 ‚ 197 ‚ 1907 ‚ 2333 ‚ 1.3351 ‚ 50.206 ‚ 12.077 ‚ ‚ 4.23 ‚ 3.64 ‚ 35.20 ‚ 43.06 ‚ 39.90 ‚ 26.13 ‚ 46.63 ‚ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆTotal 574 754 4090 5418 10.59 13.92 75.49 100.00   Statistics for Table of COLLEGE by RACE Statistic DF Value ProbƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒChi-Square 4 112.7585 <.0001Likelihood Ratio Chi-Square 4 117.5525 <.0001Mantel-Haenszel Chi-Square 1 41.2598 <.0001

COLLEGE(Institution Selected for Postsec Ed)

RACEFrequency ‚Cell Chi-Square ‚Percent ‚Col Pct ‚Black ‚Latino ‚White ‚ TotalƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆNo Postsec Ed ‚ 120 ‚ 192 ‚ 716 ‚ 1028 ‚ 1.1294 ‚ 16.74 ‚ 4.6433 ‚ ‚ 2.21 ‚ 3.54 ‚ 13.22 ‚ 18.97 ‚ 20.91 ‚ 25.46 ‚ 17.51 ‚ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆTech/Voc or Comm ‚ 225 ‚ 365 ‚ 1467 ‚ 2057 Coll ‚ 0.2297 ‚ 21.656 ‚ 4.7421 ‚ ‚ 4.15 ‚ 6.74 ‚ 27.08 ‚ 37.97 ‚ 39.20 ‚ 48.41 ‚ 35.87 ‚ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ4-Year College ‚ 229 ‚ 197 ‚ 1907 ‚ 2333 ‚ 1.3351 ‚ 50.206 ‚ 12.077 ‚ ‚ 4.23 ‚ 3.64 ‚ 35.20 ‚ 43.06 ‚ 39.90 ‚ 26.13 ‚ 46.63 ‚ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆTotal 574 754 4090 5418 10.59 13.92 75.49 100.00   Statistics for Table of COLLEGE by RACE Statistic DF Value ProbƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒChi-Square 4 112.7585 <.0001Likelihood Ratio Chi-Square 4 117.5525 <.0001Mantel-Haenszel Chi-Square 1 41.2598 <.0001

A useful diagnostic tool that can help you determine where the detected relationship really resides….

A useful diagnostic tool that can help you determine where the detected relationship really resides….

12.07750.20621.65616.740

3351.17421.42297.06433.41294.1

7585.112

2

2

2

onscontributi cell the of Sum Table

Exp

ExpObs to onContributi Cell

cell that for

Examine each cell for a “large” contribution to the 2 statistic: The story is really about Latinos:

• More Latinos than expected are going to community college, or not going to college at all.

• Fewer Latinos than expected are going to 4 year college.

However, there are a few more Whites than expected going to 4-year college.

Examine each cell for a “large” contribution to the 2 statistic: The story is really about Latinos:

• More Latinos than expected are going to community college, or not going to college at all.

• Fewer Latinos than expected are going to 4 year college.

However, there are a few more Whites than expected going to 4-year college.

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis Classical Contingency Table Analysis In The Alternative Routes Dataset

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis Classical Contingency Table Analysis In The Alternative Routes Dataset

Page 11: © Willett, Harvard University Graduate School of Education, 8/4/2015S052/II.2(b) – Slide 1 S052/II.2(b): Applied Data Analysis Roadmap of the Course –

© Willett, Harvard University Graduate School of Education, 04/19/23 S052/II.2(b) – Slide 11

 „ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ† ‚ ‚ Socio-economic status ‚ ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚ P5 ‚ Median ‚ P95 ‚ ‡ƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚Institut-‚RACE ‚ ‚ ‚ ‚ ‚ion ‚ ‚ ‚ ‚ ‚ ‚Selected ‚ ‚ ‚ ‚ ‚ ‚for ‚ ‚ ‚ ‚ ‚ ‚Postsec ‚ ‚ ‚ ‚ ‚ ‚Ed ‚ ‚ ‚ ‚ ‚ ‡ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚No ‚Black ‚ 0.99‚ 1.96‚ 3.19‚ ‚Postsec ‡ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚Ed ‚Latino ‚ 1.06‚ 1.81‚ 2.99‚ ‚ ‡ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚White ‚ 1.45‚ 2.38‚ 3.42‚ ‚ ‡ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚All ‚ 1.34‚ 2.28‚ 3.34‚ ‡ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚Tech/Voc ‚RACE ‚ ‚ ‚ ‚ ‚or Comm ‡ƒƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚Coll ‚Black ‚ 1.30‚ 2.41‚ 3.55‚ ‚ ‡ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚Latino ‚ 1.42‚ 2.33‚ 3.68‚ ‚ ‡ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚White ‚ 1.79‚ 2.84‚ 3.81‚ ‚ ‡ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚All ‚ 1.62‚ 2.74‚ 3.76‚ ‡ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚4-Year ‚RACE ‚ ‚ ‚ ‚ ‚College ‡ƒƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚ ‚Black ‚ 1.63‚ 2.98‚ 4.15‚ ‚ ‡ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚Latino ‚ 1.49‚ 2.68‚ 4.13‚ ‚ ‡ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚White ‚ 2.20‚ 3.37‚ 4.39‚ ‚ ‡ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚All ‚ 1.97‚ 3.28‚ 4.36‚ Šƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒŒ

 „ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ† ‚ ‚ Socio-economic status ‚ ‚ ‡ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚ P5 ‚ Median ‚ P95 ‚ ‡ƒƒƒƒƒƒƒƒƒ…ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚Institut-‚RACE ‚ ‚ ‚ ‚ ‚ion ‚ ‚ ‚ ‚ ‚ ‚Selected ‚ ‚ ‚ ‚ ‚ ‚for ‚ ‚ ‚ ‚ ‚ ‚Postsec ‚ ‚ ‚ ‚ ‚ ‚Ed ‚ ‚ ‚ ‚ ‚ ‡ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚No ‚Black ‚ 0.99‚ 1.96‚ 3.19‚ ‚Postsec ‡ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚Ed ‚Latino ‚ 1.06‚ 1.81‚ 2.99‚ ‚ ‡ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚White ‚ 1.45‚ 2.38‚ 3.42‚ ‚ ‡ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚All ‚ 1.34‚ 2.28‚ 3.34‚ ‡ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚Tech/Voc ‚RACE ‚ ‚ ‚ ‚ ‚or Comm ‡ƒƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚Coll ‚Black ‚ 1.30‚ 2.41‚ 3.55‚ ‚ ‡ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚Latino ‚ 1.42‚ 2.33‚ 3.68‚ ‚ ‡ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚White ‚ 1.79‚ 2.84‚ 3.81‚ ‚ ‡ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚All ‚ 1.62‚ 2.74‚ 3.76‚ ‡ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚4-Year ‚RACE ‚ ‚ ‚ ‚ ‚College ‡ƒƒƒƒƒƒƒƒƒ‰ ‚ ‚ ‚ ‚ ‚Black ‚ 1.63‚ 2.98‚ 4.15‚ ‚ ‡ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚Latino ‚ 1.49‚ 2.68‚ 4.13‚ ‚ ‡ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚White ‚ 2.20‚ 3.37‚ 4.39‚ ‚ ‡ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ‰ ‚ ‚All ‚ 1.97‚ 3.28‚ 4.36‚ Šƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒ‹ƒƒƒƒƒƒƒƒƒƒƒƒŒ

Useful to know, when we produce prototypical fitted plots….

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis Tabulation of SES, by COLLEGE and RACE

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis Tabulation of SES, by COLLEGE and RACE

Page 12: © Willett, Harvard University Graduate School of Education, 8/4/2015S052/II.2(b) – Slide 1 S052/II.2(b): Applied Data Analysis Roadmap of the Course –

© Willett, Harvard University Graduate School of Education, 04/19/23 S052/II.2(b) – Slide 12

*---------------------------------------------------------------------------------*Fitting A Taxonomy Of Nested Multinomial Logit Models*---------------------------------------------------------------------------------*;PROC LOGISTIC DATA=ALT_RTS_GIRLS; M1: MODEL COLLEGE(ref='1')= BLACK LATINO / LINK=GLOGIT EXPB RSQUARE;

PROC LOGISTIC DATA=ALT_RTS_GIRLS; M2: MODEL COLLEGE(ref='1')= SES / LINK=GLOGIT EXPB RSQUARE;

PROC LOGISTIC DATA=ALT_RTS_GIRLS; M3: MODEL COLLEGE(ref='1')= BLACK LATINO SES / LINK=GLOGIT EXPB RSQUARE;

PROC LOGISTIC DATA=ALT_RTS_GIRLS; M4: MODEL COLLEGE(ref='1')= BLACK LATINO SES BxSES LxSES / LINK=GLOGIT EXPB RSQUARE;

*---------------------------------------------------------------------------------*Fitting A Taxonomy Of Nested Multinomial Logit Models*---------------------------------------------------------------------------------*;PROC LOGISTIC DATA=ALT_RTS_GIRLS; M1: MODEL COLLEGE(ref='1')= BLACK LATINO / LINK=GLOGIT EXPB RSQUARE;

PROC LOGISTIC DATA=ALT_RTS_GIRLS; M2: MODEL COLLEGE(ref='1')= SES / LINK=GLOGIT EXPB RSQUARE;

PROC LOGISTIC DATA=ALT_RTS_GIRLS; M3: MODEL COLLEGE(ref='1')= BLACK LATINO SES / LINK=GLOGIT EXPB RSQUARE;

PROC LOGISTIC DATA=ALT_RTS_GIRLS; M4: MODEL COLLEGE(ref='1')= BLACK LATINO SES BxSES LxSES / LINK=GLOGIT EXPB RSQUARE;

PROC LOGISTIC can fit models for several categorical outcomes:Binomial logit (by default, if the outcome is

dichotomous),Ordinal logit (by default, if the outcome is ordinal

and more than two categories),Multinomial logit, if you choose the GLOGIT

(“generalized logit”) “link” function.

PROC LOGISTIC can fit models for several categorical outcomes:Binomial logit (by default, if the outcome is

dichotomous),Ordinal logit (by default, if the outcome is ordinal

and more than two categories),Multinomial logit, if you choose the GLOGIT

(“generalized logit”) “link” function.

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis Fitting A Taxonomy of Multinomial Logit Models To The Alternative Routes DataS052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis

Fitting A Taxonomy of Multinomial Logit Models To The Alternative Routes Data

When fitting a multinomial model, you must specify the common reference category that will be used in each of the simultaneous binomial comparisons: Here, I have chosen “1” or “no college” as

the reference category. Any of the available categories can be

selected – it’s a substantive choice, not a statistical one.

When fitting a multinomial model, you must specify the common reference category that will be used in each of the simultaneous binomial comparisons: Here, I have chosen “1” or “no college” as

the reference category. Any of the available categories can be

selected – it’s a substantive choice, not a statistical one.

Page 13: © Willett, Harvard University Graduate School of Education, 8/4/2015S052/II.2(b) – Slide 1 S052/II.2(b): Applied Data Analysis Roadmap of the Course –

© Willett, Harvard University Graduate School of Education, 04/19/23 S052/II.2(b) – Slide 13

The LOGISTIC Procedure  Model Information Data Set WORK.ALT_RTS_GIRLSResponse Variable COLLEGE Institution Selected for Postsec EdNumber of Response Levels 3Number of Observations 5418Model generalized logitOptimization Technique Fisher's scoring   Response Profile  Ordered Total Value COLLEGE Frequency  1 1 1028 2 2 2057 3 3 2333  Logits modeled use COLLEGE=1 as the reference category.   Model Convergence Status  Convergence criterion (GCONV=1E-8) satisfied.

The LOGISTIC Procedure  Model Information Data Set WORK.ALT_RTS_GIRLSResponse Variable COLLEGE Institution Selected for Postsec EdNumber of Response Levels 3Number of Observations 5418Model generalized logitOptimization Technique Fisher's scoring   Response Profile  Ordered Total Value COLLEGE Frequency  1 1 1028 2 2 2057 3 3 2333  Logits modeled use COLLEGE=1 as the reference category.   Model Convergence Status  Convergence criterion (GCONV=1E-8) satisfied.

Examine the output for Model M1, containing the main effects of only predictors BLACK and LATINO …Examine the output for Model M1, containing the main effects of only predictors BLACK and LATINO …

The history of the fitting process looks very similar to

that produced in regular logistic regression analysis,

but contains interesting distinguishing information

The history of the fitting process looks very similar to

that produced in regular logistic regression analysis,

but contains interesting distinguishing information

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis 1st Page Of Output For Any Model Confirms That A Multinomial Logit Model Has Been Fit

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis 1st Page Of Output For Any Model Confirms That A Multinomial Logit Model Has Been Fit

Confirms that a multinomial logit multinomial logit modelmodel has been fitted

Confirms that a multinomial logit multinomial logit modelmodel has been fitted

Confirms that the “no college” option (COLLEGE=1) is being used as the reference categoryreference category.

Confirms that the “no college” option (COLLEGE=1) is being used as the reference categoryreference category.

Tells you that three levelsthree levels have been detected in the outcome variable.

Tells you that three levelsthree levels have been detected in the outcome variable.

Page 14: © Willett, Harvard University Graduate School of Education, 8/4/2015S052/II.2(b) – Slide 1 S052/II.2(b): Applied Data Analysis Roadmap of the Course –

© Willett, Harvard University Graduate School of Education, 04/19/23 S052/II.2(b) – Slide 14

Model Fit Statistics  Intercept Intercept and Criterion Only Covariates -2 Log L 11333.060 11215.508   R-Square 0.0215 Max-rescaled R-Square 0.0245

  Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSqLikelihood Ratio 117.5525 4 <.0001

Model Fit Statistics  Intercept Intercept and Criterion Only Covariates -2 Log L 11333.060 11215.508   R-Square 0.0215 Max-rescaled R-Square 0.0245

  Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSqLikelihood Ratio 117.5525 4 <.0001

H0: The simultaneous effect of all predictors in Model M1 (race predictors, BLACK and LATINO) on a girl’s choice of college is zero, in the population.

Test Statistic: 2 = 117.55 (df=4), p< .0001

Decision: Reject H0

Conclusion: In the population, a girl’s college choice depends on her race.

H0: The simultaneous effect of all predictors in Model M1 (race predictors, BLACK and LATINO) on a girl’s choice of college is zero, in the population.

Test Statistic: 2 = 117.55 (df=4), p< .0001

Decision: Reject H0

Conclusion: In the population, a girl’s college choice depends on her race.

51.112152LL:M1 Model

06.113332LL:Model Null

55.117)2( LL

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis Overall Fit Of The Model Is Assessed In The Usual Way By The -2LL Statistic

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis Overall Fit Of The Model Is Assessed In The Usual Way By The -2LL Statistic

Page 15: © Willett, Harvard University Graduate School of Education, 8/4/2015S052/II.2(b) – Slide 1 S052/II.2(b): Applied Data Analysis Roadmap of the Course –

© Willett, Harvard University Graduate School of Education, 04/19/23 S052/II.2(b) – Slide 15

Analysis of Maximum Likelihood Estimates  Standard Wald Parameter COLLEGE DF Estimate Error Chi-Square Pr > ChiSq Exp(Est)  Intercept 2 1 0.7173 0.0456 247.5623 <.0001 2.049 Intercept 3 1 0.9796 0.0438 499.5382 <.0001 2.663 BLACK 2 1 -0.0887 0.1219 0.5294 0.4668 0.915 BLACK 3 1 -0.3334 0.1209 7.6013 0.0058 0.717 LATINO 2 1 -0.0749 0.1001 0.5594 0.4545 0.928 LATINO 3 1 -0.9539 0.1105 74.5501 <.0001 0.385

Analysis of Maximum Likelihood Estimates  Standard Wald Parameter COLLEGE DF Estimate Error Chi-Square Pr > ChiSq Exp(Est)  Intercept 2 1 0.7173 0.0456 247.5623 <.0001 2.049 Intercept 3 1 0.9796 0.0438 499.5382 <.0001 2.663 BLACK 2 1 -0.0887 0.1219 0.5294 0.4668 0.915 BLACK 3 1 -0.3334 0.1209 7.6013 0.0058 0.717 LATINO 2 1 -0.0749 0.1001 0.5594 0.4545 0.928 LATINO 3 1 -0.9539 0.1105 74.5501 <.0001 0.385

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis Parameter Estimates, And Ancillary Statistics, Are Present To Excess

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis Parameter Estimates, And Ancillary Statistics, Are Present To Excess

All parameter estimates are present in pairs

because there were two outcome comparisons

All parameter estimates are present in pairs

because there were two outcome comparisons

Under “COLLEGE” is recorded the label of the “upper” category: “2” represents

community college, “3” represents 4-

year college.

Under “COLLEGE” is recorded the label of the “upper” category: “2” represents

community college, “3” represents 4-

year college.

There are two sets of parameter estimates, one for each facet of the multinomial model:There are two sets of parameter estimates, one for each facet of the multinomial model:

)954.0333.0980.0(

)075.0089.0717.0(

1

1

1 ˆ

LB, Coll.

Novs.

Coll.

Year-4

LB,

LATINOBLACK

LATINOBLACK

ep

ep

Coll.

Novs.

Coll.

Comm.

Page 16: © Willett, Harvard University Graduate School of Education, 8/4/2015S052/II.2(b) – Slide 1 S052/II.2(b): Applied Data Analysis Roadmap of the Course –

© Willett, Harvard University Graduate School of Education, 04/19/23 S052/II.2(b) – Slide 16

Odds Ratio Estimates  Point 95% WaldEffect COLLEGE Estimate Confidence Limits BLACK 2 0.915 0.721 1.162BLACK 3 0.717 0.565 0.908LATINO 2 0.928 0.763 1.129LATINO 3 0.385 0.310 0.478

Odds Ratio Estimates  Point 95% WaldEffect COLLEGE Estimate Confidence Limits BLACK 2 0.915 0.721 1.162BLACK 3 0.717 0.565 0.908LATINO 2 0.928 0.763 1.129LATINO 3 0.385 0.310 0.478

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis Antilogged Parameter Estimates Are Also Provided, To Be Interpreted As Odds-Ratios

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis Antilogged Parameter Estimates Are Also Provided, To Be Interpreted As Odds-Ratios

“The fitted odds that a Caucasian girl will go to Four-Year college (vs.

choosing no college at all) is 2.6 times the fitted odds that a Latino girl will have

the same outcome”

“The fitted odds that a Caucasian girl will go to Four-Year college (vs.

choosing no college at all) is 2.6 times the fitted odds that a Latino girl will have

the same outcome”

60.2385.0

1

The fitted odds that a Caucasian girl will go to Four-Year College (vs. not going to college at all) is 1.39 times the fitted odds that an African –American girl will have

the same outcome.

The fitted odds that a Caucasian girl will go to Four-Year College (vs. not going to college at all) is 1.39 times the fitted odds that an African –American girl will have

the same outcome.

394.1717.0

1

When odds-ratios are less than unity, it’s best to invert them, for

interpretive purposes, but remember to invert the interpretation too

When odds-ratios are less than unity, it’s best to invert them, for

interpretive purposes, but remember to invert the interpretation too

Both these confidence intervals cover unity (the “null” value for an odds-ratio), and so:We cannot reject the null

hypothesis in either case.There are no statistically

significant differences in the probability of going to community college/vocational training (e.g., outcome = 2) among girls of all three race/ethnicities.

Both these confidence intervals cover unity (the “null” value for an odds-ratio), and so:We cannot reject the null

hypothesis in either case.There are no statistically

significant differences in the probability of going to community college/vocational training (e.g., outcome = 2) among girls of all three race/ethnicities.

Page 17: © Willett, Harvard University Graduate School of Education, 8/4/2015S052/II.2(b) – Slide 1 S052/II.2(b): Applied Data Analysis Roadmap of the Course –

© Willett, Harvard University Graduate School of Education, 04/19/23 S052/II.2(b) – Slide 17

Taxonomy of fitted multinomial logit models of the relationship between a girl’s choice of postsecondary institution and her race/ethnicity and socioeconomic status (n=5148).

Model

Null M1 M2 M3 M4

Technical/Vocational and Community College vs. None

Intercept 0.717*** -1.594*** -1.964*** -2.032***

BLACK -0.089 0.375** 0.748~

LATINO -0.075 0.449*** 0.523

SES 0.917*** 1.017*** 1.051***

BLACKSES -0.188

LATINO SES -0.067

Four Year College vs. None

Intercept 0.980*** -4.854*** -5.129*** -5.682***

BLACK -0.333**   0.568*** 1.844***

LATINO -0.954***   0.120 2.317***

SES 2.053*** 2.127*** 2.318***

BLACKSES -0.495*

LATINO SES -0.846***

-2LL 11333.1 11215.5 9937.5 9901.5 9864.7

Key: ~ p<.10; * p<.05; ** p<.01; *** p<.001

Test statistic: difference in –2LL

2 = (9901.5-9864.7) = 36.8

Critical value: 2(df=4;=.05) = 9.49

Decision: Reject H0

Conclusion: Controlling for the main effects of race/ethnicity and socioeconomic status, the post-secondary education choices of African-American, Latina and Caucasian girls depend on their socioeconomic status, in the population.

Test statistic: difference in –2LL

2 = (9901.5-9864.7) = 36.8

Critical value: 2(df=4;=.05) = 9.49

Decision: Reject H0

Conclusion: Controlling for the main effects of race/ethnicity and socioeconomic status, the post-secondary education choices of African-American, Latina and Caucasian girls depend on their socioeconomic status, in the population.

.0

,0

,0

,0:

}Coll No vs.Coll.4Year {SESLATINA

}Coll No vs.Coll.4Year {SESBLACK

Coll} No vs.Coll. {CommSESLATINA

Coll} No vs.Coll. {CommSESBLACK0

H

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis Final Taxonomy Of Fitted Multinomial Logit Models

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis Final Taxonomy Of Fitted Multinomial Logit Models

Page 18: © Willett, Harvard University Graduate School of Education, 8/4/2015S052/II.2(b) – Slide 1 S052/II.2(b): Applied Data Analysis Roadmap of the Course –

© Willett, Harvard University Graduate School of Education, 04/19/23 S052/II.2(b) – Slide 18

Taxonomy of fitted multinomial logit models of the relationship between a girl’s choice of postsecondary institution and her race/ethnicity and socioeconomic status (n=5148).

Model

Null M1 M2 M3 M4

Technical/Vocational and Community College vs. None

Intercept 0.717*** -1.594*** -1.964*** -2.032***

BLACK -0.089 0.375** 0.748~

LATINO -0.075 0.449*** 0.523

SES 0.917*** 1.017*** 1.051***

BLACKSES -0.188

LATINO SES -0.067

Four Year College vs. None

Intercept 0.980*** -4.854*** -5.129*** -5.682***

BLACK -0.333**   0.568*** 1.844***

LATINO -0.954***   0.120 2.317***

SES 2.053*** 2.127*** 2.318***

BLACKSES -0.495*

LATINO SES -0.846***

-2LL 11333.1 11215.5 9937.5 9901.5 9864.7

Key: ~ p<.10; * p<.05; ** p<.01; *** p<.001

Type III Analysis of Effects  WaldEffect DF Chi-Square Pr > ChiSq

BLACK 2 12.4539 0.0020LATINO 2 29.8929 <.0001SES 2 789.6798 <.0001BxSES 2 6.4279 0.0402LxSES 2 36.1789 <.0001

Type III Analysis of Effects  WaldEffect DF Chi-Square Pr > ChiSq

BLACK 2 12.4539 0.0020LATINO 2 29.8929 <.0001SES 2 789.6798 <.0001BxSES 2 6.4279 0.0402LxSES 2 36.1789 <.0001

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis There Are Additional “Type III” Tests,

If You Want To Test The Impact Of A Single Predictor On The Joint Outcome

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis There Are Additional “Type III” Tests,

If You Want To Test The Impact Of A Single Predictor On The Joint Outcome

Test Statistic: 2 = 6.43, df=2, p=0.04

Decision: Reject H0

Conclusion: Postsecondary education choices of Black and White girls differ by their SES, in the population.

0

0:}{

0

Coll No vs. Coll. 4Year

SESBLACK

Coll} No vs. Coll. {CommSESBLACK

H

Page 19: © Willett, Harvard University Graduate School of Education, 8/4/2015S052/II.2(b) – Slide 1 S052/II.2(b): Applied Data Analysis Roadmap of the Course –

© Willett, Harvard University Graduate School of Education, 04/19/23 S052/II.2(b) – Slide 19

Taxonomy of fitted multinomial logit models of the relationship between a girl’s choice of postsecondary institution and her race/ethnicity and socioeconomic status (n=5148).

Model

Null M1 M2 M3 M4

Technical/Vocational and Community College vs. None

Intercept 0.717*** -1.594*** -1.964*** -2.032***

BLACK -0.089 0.375** 0.748~

LATINO -0.075 0.449*** 0.523

SES 0.917*** 1.017*** 1.051***

BLACKSES -0.188

LATINO SES -0.067

Four Year College vs. None

Intercept 0.980*** -4.854*** -5.129*** -5.682***

BLACK -0.333**   0.568*** 1.844***

LATINO -0.954***   0.120 2.317***

SES 2.053*** 2.127*** 2.318***

BLACKSES -0.495*

LATINO SES -0.846***

-2LL 11333.1 11215.5 9937.5 9901.5 9864.7

Key: ~ p<.10; * p<.05; ** p<.01; *** p<.001

Type III Analysis of Effects  WaldEffect DF Chi-Square Pr > ChiSq

BLACK 2 12.4539 0.0020LATINO 2 29.8929 <.0001SES 2 789.6798 <.0001BxSES 2 6.4279 0.0402LxSES 2 36.1789 <.0001

Type III Analysis of Effects  WaldEffect DF Chi-Square Pr > ChiSq

BLACK 2 12.4539 0.0020LATINO 2 29.8929 <.0001SES 2 789.6798 <.0001BxSES 2 6.4279 0.0402LxSES 2 36.1789 <.0001

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis There Are Additional “Type III” Tests,

If You Want To Test The Impact Of A Single Predictor On The Joint Outcome

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis There Are Additional “Type III” Tests,

If You Want To Test The Impact Of A Single Predictor On The Joint Outcome

Test Statistics: 2 = 36.18 (df=2), p<.0001

Decision: Reject H0

Conclusion: Postsecondary education choices of Latina and White girls differ by their SES, in the population.

0

0:}{

0

Coll No vs. Coll. 4YearSESLATINA

Coll} No vs. Coll. {CommSESLATINA

H

Page 20: © Willett, Harvard University Graduate School of Education, 8/4/2015S052/II.2(b) – Slide 1 S052/II.2(b): Applied Data Analysis Roadmap of the Course –

© Willett, Harvard University Graduate School of Education, 04/19/23 S052/II.2(b) – Slide 20

Model

Null M1 M2 M3 M4

Technical/Vocational and Community College vs. None

Intercept 0.717*** -1.594*** -1.964*** -2.032***

BLACK -0.089 0.375** 0.748~

LATINO -0.075 0.449*** 0.523

SES 0.917*** 1.017*** 1.051***

BLACKSES -0.188

LATINO SES -0.067

Four Year College vs. None

Intercept 0.980*** -4.854*** -5.129*** -5.682***

BLACK -0.333**   0.568*** 1.844***

LATINO -0.954***   0.120 2.317***

SES 2.053*** 2.127*** 2.318***

BLACKSES -0.495*

LATINO SES -0.846***

-2LL 11333.1 11215.5 9937.5 9901.5 9864.7

AIC 11337.1 11227.5 9945.5 9917.5 9888.7

R2 0 0.022 0.227 0.232 0.237

We can recover prototypical fitted equations in the usual way…but,

now there are two sets …

We can recover prototypical fitted equations in the usual way…but,

now there are two sets …

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis Writing Down Fitted Models Is Straightforward

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis Writing Down Fitted Models Is Straightforward

)***85.0*50.0***32.2***32.2***84.1***68.5(

)07.019.0***05.152.075.0***03.2(

1

1

Coll. No

vs.

Coll.4Year

Coll. No

vs.

Coll. Comm.

~

SESLATINOSESBLACKSESLATINOBLACK

SESLATINOSESBLACKSESLATINOBLACK

ep

ep

)***85.0*50.0***32.2***32.2***84.1***68.5(

)07.019.0***05.152.075.0***03.2(

1

1

Coll. No

vs.

Coll.4Year

Coll. No

vs.

Coll. Comm.

~

SESLATINOSESBLACKSESLATINOBLACK

SESLATINOSESBLACKSESLATINOBLACK

ep

ep

Page 21: © Willett, Harvard University Graduate School of Education, 8/4/2015S052/II.2(b) – Slide 1 S052/II.2(b): Applied Data Analysis Roadmap of the Course –

© Willett, Harvard University Graduate School of Education, 04/19/23 S052/II.2(b) – Slide 21

White Black Latino White Black Latino

1 0.396277 0.1176371.1 0.417097 0.394939 0.137917 0.1485781.2 0.438217 0.418679 0.161055 0.1681771.3 0.459564 0.442801 0.187232 0.1897861.4 0.481059 0.467197 0.216565 0.2134591.5 0.388054 0.502625 0.491751 0.099302 0.249085 0.2392131.6 0.413285 0.524181 0.516344 0.122046 0.284713 0.2670191.7 0.438981 0.545648 0.540859 0.149135 0.323245 0.2967961.8 0.465007 0.566946 0.565177 0.180998 0.364335 0.3284071.9 0.491226 0.587999 0.589186 0.217925 0.407502 0.3616522 0.517493 0.608735 0.612777 0.259994 0.452147 0.396277

2.1 0.543663 0.629087 0.635851 0.306996 0.497575 0.4319742.2 0.569595 0.648989 0.658316 0.358381 0.543043 0.4683922.3 0.595151 0.668387 0.680093 0.413237 0.587805 0.505152.4 0.620201 0.68723 0.701112 0.470335 0.631161 0.5418522.5 0.644626 0.705473 0.721316 0.52822 0.672497 0.5781052.6 0.668321 0.723082 0.74066 0.585356 0.711319 0.6135362.7 0.691192 0.740025 0.759109 0.640285 0.747269 0.6478042.8 0.713164 0.756282 0.776641 0.691768 0.780126 0.6806152.9 0.734173 0.771835 0.793245 0.738889 0.809798 0.711733 0.754174 0.786675 0.808919 0.781085 0.836307 0.740967

3.1 0.773135 0.8008 0.823668 0.818141 0.85976 0.7682043.2 0.791039 0.81421 0.837508 0.850127 0.880334 0.7933773.3 0.807881 0.826912 0.850458 0.877332 0.898247 0.816468

SES

ComColl vs None 4-Year Coll vs. None

0

0.5

1

0.5 1.5 2.5 3.5 4.5Family SES

Fit

ted

Pro

babi

lity

0

0.5

1

0.5 1.5 2.5 3.5 4.5Family SES

Fit

ted

Pro

babi

lity

White Black Latino White Black Latino

1 0.396277 0.1176371.1 0.417097 0.394939 0.137917 0.1485781.2 0.438217 0.418679 0.161055 0.1681771.3 0.459564 0.442801 0.187232 0.1897861.4 0.481059 0.467197 0.216565 0.2134591.5 0.388054 0.502625 0.491751 0.099302 0.249085 0.2392131.6 0.413285 0.524181 0.516344 0.122046 0.284713 0.2670191.7 0.438981 0.545648 0.540859 0.149135 0.323245 0.2967961.8 0.465007 0.566946 0.565177 0.180998 0.364335 0.3284071.9 0.491226 0.587999 0.589186 0.217925 0.407502 0.3616522 0.517493 0.608735 0.612777 0.259994 0.452147 0.396277

2.1 0.543663 0.629087 0.635851 0.306996 0.497575 0.4319742.2 0.569595 0.648989 0.658316 0.358381 0.543043 0.4683922.3 0.595151 0.668387 0.680093 0.413237 0.587805 0.505152.4 0.620201 0.68723 0.701112 0.470335 0.631161 0.5418522.5 0.644626 0.705473 0.721316 0.52822 0.672497 0.5781052.6 0.668321 0.723082 0.74066 0.585356 0.711319 0.6135362.7 0.691192 0.740025 0.759109 0.640285 0.747269 0.6478042.8 0.713164 0.756282 0.776641 0.691768 0.780126 0.6806152.9 0.734173 0.771835 0.793245 0.738889 0.809798 0.711733 0.754174 0.786675 0.808919 0.781085 0.836307 0.740967

3.1 0.773135 0.8008 0.823668 0.818141 0.85976 0.7682043.2 0.791039 0.81421 0.837508 0.850127 0.880334 0.7933773.3 0.807881 0.826912 0.850458 0.877332 0.898247 0.816468

SES

ComColl vs None 4-Year Coll vs. None

0

0.5

1

0.5 1.5 2.5 3.5 4.5Family SES

Fit

ted

Pro

babi

lity

0

0.5

1

0.5 1.5 2.5 3.5 4.5Family SES

Fit

ted

Pro

babi

lity

Community Collegevs. No College

4-Year Collegevs. No College

W

W

BL

B

L

Among female adolescents who do not go to four-year college: High SES youth are more likely to enroll in community college than low

SES youth. At each level of SES, with an effect larger at low SES, B & L youth have a

similar and higher probability of enrolling in community college than W youth.

Among female adolescents who do not go to four-year college: High SES youth are more likely to enroll in community college than low

SES youth. At each level of SES, with an effect larger at low SES, B & L youth have a

similar and higher probability of enrolling in community college than W youth.

Among female adolescents who do not go to community college:

High SES youth are more likely to go to 4-yr college than low SES youth, and these differences are greater than corresponding effects for enrollment in community college.

At low SES, all youth have a lower probability of going to four-year college than to community college, but B & L youth have a similar and higher probability of enrolling in a 4-yr college than W youth.

At high SES, B & W youth have a similar and higher probability of enrolling in a four-year college than do L youth.

Among female adolescents who do not go to community college:

High SES youth are more likely to go to 4-yr college than low SES youth, and these differences are greater than corresponding effects for enrollment in community college.

At low SES, all youth have a lower probability of going to four-year college than to community college, but B & L youth have a similar and higher probability of enrolling in a 4-yr college than W youth.

At high SES, B & W youth have a similar and higher probability of enrolling in a four-year college than do L youth.

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis Producing Fitted Plots For Prototypical Individuals Is Just The Same

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis Producing Fitted Plots For Prototypical Individuals Is Just The Same

Page 22: © Willett, Harvard University Graduate School of Education, 8/4/2015S052/II.2(b) – Slide 1 S052/II.2(b): Applied Data Analysis Roadmap of the Course –

© Willett, Harvard University Graduate School of Education, 04/19/23 S052/II.2(b) – Slide 22

WhitesLatinosLatinosWhites

1

WhitesLatinosLatinosWhites

1

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis Appendix I: An Algebraic Aside On The Inversion Of Odds-ratios

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis Appendix I: An Algebraic Aside On The Inversion Of Odds-ratios

Odds-ratios can be inverted, you just have to get the interpretation correct …Odds-ratios can be inverted, you just have to get the interpretation correct …

Page 23: © Willett, Harvard University Graduate School of Education, 8/4/2015S052/II.2(b) – Slide 1 S052/II.2(b): Applied Data Analysis Roadmap of the Course –

© Willett, Harvard University Graduate School of Education, 04/19/23 S052/II.2(b) – Slide 23

*--------------------------------------------------------------------------------*Input the data, name and label the variables in the dataset*--------------------------------------------------------------------------------*; DATA ALT_RTS_GIRLS; INFILE 'C:\DATA\S052\ALT_RTS_GIRLS.txt'; INPUT ID COLLEGE HSGRAD READ MATH WHITE BLACK LATINO SES; LABEL COLLEGE = 'Institution Selected for Postsec Ed' BLACK = 'Is respondent African-American?' LATINO = 'Is respondent Hispanic?' WHITE = 'Is respondent Caucasian?' SES = 'Socio-economic status'; * Create the required two-way interactions between RACE and SES; BxSES = BLACK*SES; LxSES = LATINO*SES; * Create a pair of new dichotomous outcomes to replace polytomous COLLEGE; IF COLLEGE=2 THEN COMCOLL=1; ELSE COMCOLL=0; IF COLLEGE=3 THEN FOURYR=1; ELSE FOURYR=0; * Format the new outcomes; PROC FORMAT; VALUE CCFMT 0='No Postsec Ed' 1='Tech/Voc or Comm Coll'; VALUE FYFMT 0='No Postsec Ed' 1='4-Year College'; *-------------------------------------------------------------------------------- Are the New Binomial Outcomes Independent?*--------------------------------------------------------------------------------; PROC FREQ DATA=ALT_RTS_GIRLS; FORMAT COMCOLL CCFMT. FOURYR FYFMT.; TABLES COMCOLL*FOURYR /CHISQ;

*--------------------------------------------------------------------------------*Input the data, name and label the variables in the dataset*--------------------------------------------------------------------------------*; DATA ALT_RTS_GIRLS; INFILE 'C:\DATA\S052\ALT_RTS_GIRLS.txt'; INPUT ID COLLEGE HSGRAD READ MATH WHITE BLACK LATINO SES; LABEL COLLEGE = 'Institution Selected for Postsec Ed' BLACK = 'Is respondent African-American?' LATINO = 'Is respondent Hispanic?' WHITE = 'Is respondent Caucasian?' SES = 'Socio-economic status'; * Create the required two-way interactions between RACE and SES; BxSES = BLACK*SES; LxSES = LATINO*SES; * Create a pair of new dichotomous outcomes to replace polytomous COLLEGE; IF COLLEGE=2 THEN COMCOLL=1; ELSE COMCOLL=0; IF COLLEGE=3 THEN FOURYR=1; ELSE FOURYR=0; * Format the new outcomes; PROC FORMAT; VALUE CCFMT 0='No Postsec Ed' 1='Tech/Voc or Comm Coll'; VALUE FYFMT 0='No Postsec Ed' 1='4-Year College'; *-------------------------------------------------------------------------------- Are the New Binomial Outcomes Independent?*--------------------------------------------------------------------------------; PROC FREQ DATA=ALT_RTS_GIRLS; FORMAT COMCOLL CCFMT. FOURYR FYFMT.; TABLES COMCOLL*FOURYR /CHISQ;

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis Appendix II: Does Using Multinomial Logistic Regression Have Advantages

Over Using Multiple Binomial Logistic Regressions?

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis Appendix II: Does Using Multinomial Logistic Regression Have Advantages

Over Using Multiple Binomial Logistic Regressions?

A comparison of results obtained when fitting a multinomial logit versus a pair of binomial logit models to the same data can be found in Data-Analytic Handout II_2b_2 …A comparison of results obtained when fitting a multinomial logit versus a pair of binomial logit models to the same data can be found in Data-Analytic Handout II_2b_2 …

Create a pair of new dichotomous outcomes: COMCOLL indicates

whether the student went to community college,

FOURYR indicates whether the student went to a four-year college.

Comparison group is the “no postsecondary education” group, in each case

Create a pair of new dichotomous outcomes: COMCOLL indicates

whether the student went to community college,

FOURYR indicates whether the student went to a four-year college.

Comparison group is the “no postsecondary education” group, in each case

Use contingency-table analysis to examine the relationship

between the two new dichotomous outcomes,

COMCOLL and FOURYR

Use contingency-table analysis to examine the relationship

between the two new dichotomous outcomes,

COMCOLL and FOURYR

Page 24: © Willett, Harvard University Graduate School of Education, 8/4/2015S052/II.2(b) – Slide 1 S052/II.2(b): Applied Data Analysis Roadmap of the Course –

© Willett, Harvard University Graduate School of Education, 04/19/23 S052/II.2(b) – Slide 24

Table of COMCOLL by FOURYR  COMCOLL FOURYR  Frequency ‚ Percent ‚ Row Pct ‚ Col Pct ‚No Posts‚4-Year C‚ Total ‚ec Ed ‚ollege ‚ ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ No Postsec Ed ‚ 1028 ‚ 2333 ‚ 3361 ‚ 18.97 ‚ 43.06 ‚ 62.03 ‚ 30.59 ‚ 69.41 ‚ ‚ 33.32 ‚ 100.00 ‚ ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Tech/Voc or Comm ‚ 2057 ‚ 0 ‚ 2057 Coll ‚ 37.97 ‚ 0.00 ‚ 37.97 ‚ 100.00 ‚ 0.00 ‚ ‚ 66.68 ‚ 0.00 ‚ ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 3085 2333 5418 56.94 43.06 100.00   Statistics for Table of COMCOLL by FOURYR Statistic DF Value ProbƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒChi-Square 1 2507.6352 <.0001Likelihood Ratio Chi-Square 1 3267.1645 <.0001Continuity Adj. Chi-Square 1 2504.8049 <.0001Mantel-Haenszel Chi-Square 1 2507.1723 <.0001Phi Coefficient -0.6803

Table of COMCOLL by FOURYR  COMCOLL FOURYR  Frequency ‚ Percent ‚ Row Pct ‚ Col Pct ‚No Posts‚4-Year C‚ Total ‚ec Ed ‚ollege ‚ ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ No Postsec Ed ‚ 1028 ‚ 2333 ‚ 3361 ‚ 18.97 ‚ 43.06 ‚ 62.03 ‚ 30.59 ‚ 69.41 ‚ ‚ 33.32 ‚ 100.00 ‚ ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Tech/Voc or Comm ‚ 2057 ‚ 0 ‚ 2057 Coll ‚ 37.97 ‚ 0.00 ‚ 37.97 ‚ 100.00 ‚ 0.00 ‚ ‚ 66.68 ‚ 0.00 ‚ ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 3085 2333 5418 56.94 43.06 100.00   Statistics for Table of COMCOLL by FOURYR Statistic DF Value ProbƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒChi-Square 1 2507.6352 <.0001Likelihood Ratio Chi-Square 1 3267.1645 <.0001Continuity Adj. Chi-Square 1 2504.8049 <.0001Mantel-Haenszel Chi-Square 1 2507.1723 <.0001Phi Coefficient -0.6803

Notice the interesting relationship between the two newly-created dichotomous outcomes …Notice the interesting relationship between the two newly-created dichotomous outcomes …

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis Appendix II: Does Using Multinomial Logistic Regression Have Advantages

Over Using Multiple Binomial Logistic Regressions?

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis Appendix II: Does Using Multinomial Logistic Regression Have Advantages

Over Using Multiple Binomial Logistic Regressions?

Same baseline group appears in both binomial comparisons.

Same baseline group appears in both binomial comparisons.

Empty cell.Empty cell.

Reject H0 and conclude that COMCOLL and FOURYR are not independent, in the

population.

Reject H0 and conclude that COMCOLL and FOURYR are not independent, in the

population.

The phi coefficient is equivalent to a Pearson’s correlation coefficient, for a pair of

dichotomous variables: r = -0.63

So, we conclude that the two new dichtomous outcomes are strongly correleated

The phi coefficient is equivalent to a Pearson’s correlation coefficient, for a pair of

dichotomous variables: r = -0.63

So, we conclude that the two new dichtomous outcomes are strongly correleated

Page 25: © Willett, Harvard University Graduate School of Education, 8/4/2015S052/II.2(b) – Slide 1 S052/II.2(b): Applied Data Analysis Roadmap of the Course –

© Willett, Harvard University Graduate School of Education, 04/19/23 S052/II.2(b) – Slide 25

*--------------------------------------------------------------------------------* Fit a binomial logit model for the community college vs. no postsec comparison;*--------------------------------------------------------------------------------*;* Pick out the sub-sample for the community college vs. no postsec comparison; DATA ALT_RTS_GIRLS_COMCOLL; SET ALT_RTS_GIRLS; IF COLLEGE NE 3;* Fit the associated binomial logit model; PROC LOGISTIC DATA=ALT_RTS_GIRLS_COMCOLL; M4A: MODEL COMCOLL(event='1')= BLACK LATINO SES BxSES LxSES; *---------------------------------------------------------------------------------* Fit the binomial logit model for the four-year college vs. no postsec comparison;*--------------------------------------------------------------------------------*;* Pick out the sub-sample for the four year vs. no postsec comparison; DATA ALT_RTS_GIRLS_FOURYR; SET ALT_RTS_GIRLS; IF COLLEGE NE 2;* Fit the associated binomial logit model; PROC LOGISTIC DATA=ALT_RTS_GIRLS_FOURYR; M4B: MODEL FOURYR(event='1')= BLACK LATINO SES BxSES LxSES;

*--------------------------------------------------------------------------------* Fit a binomial logit model for the community college vs. no postsec comparison;*--------------------------------------------------------------------------------*;* Pick out the sub-sample for the community college vs. no postsec comparison; DATA ALT_RTS_GIRLS_COMCOLL; SET ALT_RTS_GIRLS; IF COLLEGE NE 3;* Fit the associated binomial logit model; PROC LOGISTIC DATA=ALT_RTS_GIRLS_COMCOLL; M4A: MODEL COMCOLL(event='1')= BLACK LATINO SES BxSES LxSES; *---------------------------------------------------------------------------------* Fit the binomial logit model for the four-year college vs. no postsec comparison;*--------------------------------------------------------------------------------*;* Pick out the sub-sample for the four year vs. no postsec comparison; DATA ALT_RTS_GIRLS_FOURYR; SET ALT_RTS_GIRLS; IF COLLEGE NE 2;* Fit the associated binomial logit model; PROC LOGISTIC DATA=ALT_RTS_GIRLS_FOURYR; M4B: MODEL FOURYR(event='1')= BLACK LATINO SES BxSES LxSES;

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis Appendix II: Does Using Multinomial Logistic Regression Have Advantages

Over Using Multiple Binomial Logistic Regressions?

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis Appendix II: Does Using Multinomial Logistic Regression Have Advantages

Over Using Multiple Binomial Logistic Regressions?

Two separate binomial logistic regression analyses: COMCOLL vs no

postsecondary education. FOURYR vs no

postsecondary education.

Two separate binomial logistic regression analyses: COMCOLL vs no

postsecondary education. FOURYR vs no

postsecondary education.

Page 26: © Willett, Harvard University Graduate School of Education, 8/4/2015S052/II.2(b) – Slide 1 S052/II.2(b): Applied Data Analysis Roadmap of the Course –

© Willett, Harvard University Graduate School of Education, 04/19/23 S052/II.2(b) – Slide 26

Estimates Standard Errors

Multi-nomial

Twin

Binomial

Multi

-nomialTwin

Binomial

Technical/Vocational and Community College vs. None

Intercept -2.03 -2.25 0.206 0.218

BLACK 0.75 0.84 0.447 0.468

LATINO 0.52 0.62 0.388 0.407

SES 1.05 1.13 0.079 0.083

BLACKSES -0.19 -0.22 0.193 0.202

LATINO SES -0.07 -0.09 0.169 0.177

Four Year College vs. None

Intercept -5.68 -5.27 0.243 0.262

BLACK 1.84 1.70 0.531 0.559

LATINO 2.32 2.03 0.465 0.494

SES 2.32 2.18 0.087 0.094

BLACKSES -0.50 -0.46 0.213 0.225

LATINO SES -0.85 -0.76 0.189 0.201

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis Appendix II: Does Using Multinomial Logistic Regression Have Advantages

Over Using Multiple Binomial Logistic Regressions?

S052/II.2(b): Extensions of the Basic Logit Approach/Multinomial Logit Analysis Appendix II: Does Using Multinomial Logistic Regression Have Advantages

Over Using Multiple Binomial Logistic Regressions?

Sample size is the largest in the multinomial analysis..

So, the standard errors are always smaller in the multinomial analysis.

Sample size is the largest in the multinomial analysis..

So, the standard errors are always smaller in the multinomial analysis.

Different samples are employed in the multinomial and the twin binomial

approaches..

The parameter estimates are not particularly affected in any systematic way -- some estimates are higher for

one outcome, some higher for the other.

Different samples are employed in the multinomial and the twin binomial

approaches..

The parameter estimates are not particularly affected in any systematic way -- some estimates are higher for

one outcome, some higher for the other.