36
EOCUMENT RESUME ED 050 169 TM 000 565 AUTHCE TITLE INSTITUTION PUE EATE NOTE EDES PRICE DESCRIPTORS AESTEACT Tuckman, Howard F. The Use of Predictive Models in Forecasting Student Choice. Florida State Univ., Tallahassee. Inst. for Social Research. Fet 71 35p. ; Paper presented at the Annual Meeting cf the American Educational Research Association, New York, New York, February 1971 EDES Price MF-$0.65 HC-$3.29 College Chcice, Cross Cultural Studies, Cultural Factors, Educational EEnetits, Ethnic Groups, Higher Education, High Schools, *Income, *Models, *Multiple Eegression Analysis, Pest Secondary Education, Predictive Measurement, *Predictor Variables, Probability Theory, Seniors, Socioeconomic Status, Student Costs, Student Research This paper uses ordinary least squares regrEssicn to cttain probabilities ±cr the post-graduation choices of high school seniors, and it presents an illustration of the use cf these probabilities in calculating future income. Problems raised by the use ot the least squares regression are discussed. The benefits of higher education and ways in which they may he used as predictors are considered. The estimates presented are based upon data collected by a questionnaire administered to 2453 public high school seniors in Dade County, Florida. (GS)

EOCUMENT RESUME ED 050 169 TM 000 565 TITLE The … · Although several econone.sts have examined the benefits of schooling, to my knowledge, few have attempted to demonstIte empirically

Embed Size (px)

Citation preview

EOCUMENT RESUME

ED 050 169 TM 000 565

AUTHCETITLE

INSTITUTION

PUE EATENOTE

EDES PRICEDESCRIPTORS

AESTEACT

Tuckman, Howard F.The Use of Predictive Models in Forecasting StudentChoice.Florida State Univ., Tallahassee. Inst. for SocialResearch.Fet 7135p. ; Paper presented at the Annual Meeting cf theAmerican Educational Research Association, New York,New York, February 1971

EDES Price MF-$0.65 HC-$3.29College Chcice, Cross Cultural Studies, CulturalFactors, Educational EEnetits, Ethnic Groups,Higher Education, High Schools, *Income, *Models,*Multiple Eegression Analysis, Pest SecondaryEducation, Predictive Measurement, *PredictorVariables, Probability Theory, Seniors,Socioeconomic Status, Student Costs, Student Research

This paper uses ordinary least squares regrEssicn tocttain probabilities ±cr the post-graduation choices of high schoolseniors, and it presents an illustration of the use cf theseprobabilities in calculating future income. Problems raised by theuse ot the least squares regression are discussed. The benefits ofhigher education and ways in which they may he used as predictors areconsidered. The estimates presented are based upon data collected bya questionnaire administered to 2453 public high school seniors inDade County, Florida. (GS)

U.S. DEPARTMENT DF HEALTH. EDUCATION& WELFARE

OFFICE OF EDUCATIONTHIS DOCUMENT HAS BEEN REPRODUCEDEXACTLY AS RECEIVED FROM THE PERSON ORORGANIZATION ORIGINATING IT. POINTS OFVIEW OR OPINIONS STATED DO NOT NECES-SARILY REPRESENT OFFICIAL OFFICE OF EDU-CATION POSITION OR POLICY

The Use of Predictive Models in Forecasting Student Choice

Howard P. Tuckman*

To date, most research on post-high school plans of seniors has

concentrated on identifying the determinants of college choice or has

simply compared the characteristics of college and non-college students.

But our understanding of the selection process increases if several post-

high school alternatives are examined. This paper uses ordinary least

squares regression to obtain probabilities for the post-high school

choices of high school seniors, and it presents an illustration of the

use of these probabilities in calculating future incomes. In the process,

we examine several problems raised by the use of the least squares re-

gression.

Herbert Simon suggests that "the viewpoint is becoming more and

more prevalent that the appropriate scientific model of the world is not

a deterministic model but a probabilistic one."' In analyzing post-high

ill) school choices, the challenge becomes one of finding a set of variables

twO which provide reasonably accurate predictions and of clearly identifying

the post-high school alternatives considered by students. In the next

(:)section, we consider the benefits of higher education and several ways

<=0 that these benefits can be used as predictors. We then use an adjusted

Cs.1111.11.

*The author is an Assistant Professor in the Department of Economics

and a Research Associate in the Institute for Social Research at the FloridaState University. He gratefully acknowledges the research assistance ofVictor Steeb, the programming assistance of Adele Spielberger, and the help-

Erg ful comments of Warren Mazek and John Chang.

1

2

ordinary least squares (OLS) technique to estimate post-high school choices

for 2,453 Dade County seniors. The conditional probabilities obtained are

then utilized to predict the future incomes of high school seniors with

select characteristics.

Benefits of Higher Education as a Predictor of Choice

Although several econone.sts have examined the benefits of schooling,

to my knowledge, few have attempted to demonstIte empirically that these

benefits have an effect on 5t-dent choice. The rapidly expanding litera-

ture on human capital, while useful in linking investment in physical and

human capital, has not as yet succeeded in establishing that rates of re-

turn from schooling affect a student's post-high school plans. Moreover,

no methods have been devised to separate the effects of current and future

benefits on student's choice of an educational path.

In this section we consider three alternative methods for incorporat-

ing perceived benefits in a predictive model. These approaches are not

exhaustive and several others might have been discussed. My major purpose

here is to estabLIIh the importance of the benefits in determining choice.

Net Beneats as a Predictor of Choice

The net benefits approach begins with the assumption that students

select that educational path that offers the greatest net benefits to them.

We shall use the term "benefit" to refer to anything that increases the

satisfaction of the student.2

Included in our definition would be any-

thing improving the student's future earning power, anything increasing

his current satisfactions or reducing the costs that he might otherwise

2

3

have tc incur. The value of the benefits a student expects to receive from

further education depends upon his time preferences for current and future

returns, the way he weighs the various goods and services provided by in-

stitutions of higher education and upon the motivation, intelligence and

attitudes he possesses.

We can identify several benefits of education. One benefit is the

financial return received from further education. Second, we might in-

clude the "financial option" return which involves the opportunity for

students to obtain still further education. Goods and services received

by the student while he obtains his education provide a third source of

benefits. Moreover, non-market benefits and "opportunity options" (i.e.,

the additional employment opportunities offered to the educated) might

also be identified. If a student takes these benefits into account in

making post-high school plans then an increase in one or all of them

should increase the probability that he continues his education.

The financial returns from higher education are well known although

they may be measured several ways.3

A careful formulation of these re-

quires the researcher to allow for ability and motivational differences

among students, and for other characteristics affecting future earnings.

Weisbrod has suggested that completing one level of schooling pro-

vides a student with the opportunity to continue on to further education.4

The "financial option" attributes to investment in one educational path a

fraction of the net financial returns from further education. This approach

suggests that the benefits of completing a four-year college should include

the value of the financial option to continue on to graduate school. The

3

4

value of an option to pursue further schooling depends upon the likelihood

that it will be exercised and upon its expected value when exercised.

Moreover, this value will be greater the further down the educational

ladder the student is. While Weisbrod has shown that a monetary value

can be assigned to the financial option there is no evidence to indicate

that this option plays a significant role in the choice process.

A student selecting among colleges or college types considers dif-

ferent bundles of goods and services. Some empirical evidence of this

has been compiled by Astin. 5Colleges, and to a lesser extent, other

forms of higher education produce both investment and consumption services.

In deciding whether to attend a college and which college to attend, a

student chooses among the bundles of goods and services available at each

of several colleges and other goods provided by the market place. For

example, a student interested in finding good books to read may do so

through the market place by joining one of the many "great books" clubs.

Thus, if all he desires is the opportunity to read, he need not go to

college. Similarly, a student interested in a college education and a

good social life may find this several ways. He might go to the local

junior college and join a local social club, or he might choose to attend

a social university away from home.

Some services are provided both by the market place and by colleges.

Others may only be accessible to college attendees. If the former are

influential in determining whether a high school student chooses a college

career as against an alternative path, then a rise in the price of college

relative to market price diminishes the likelihood of college attendance.

4

5

If market services are poor substitutes for college services, a change

in their price will have little effect on high school student choice.

Consider two alternative methods for valuing the goods and services

(hereafter called services) received at college. The first assumes that

the value of the services received by a student must at least equal the

price he pays to attend the college. This method provides no guidelines

for separating investment and consumption purchases. Alternatively, if

the services received by a student can be identified, they might then be

valued using the price of equivalent services in the private sector. Thus,

current services could presumably be included either in a single net hene-

fits measure or they might be entered as separate parameters in a model

used to predict student choice.

Higher education increases the alternative job possibilities open

to the student and permits him to choose among a larger and more varied

number of jobs. Several of these jobs involve non-monetary rewards. For

example, the clergyman, social worker, or civil servant may receive bene-

fits from their jobs in the form of psychic satisfaction or security.

While these are difficult to quantify, they no doubt figure in some stu-

dents' choice of educational path.

Non-market benefits pose valuation problems both for the researcher

and for the student. Questions arise concerning the value of being away

from home, of maturity before entering the labor force, of exposure to a

broad range of new ideas, etc. These benefits are important insofar as

they are taken into account by the decision-maker. Since students (and

their parents) have limited knowledge of the future, it seems reasonable

5

6

to assume that benefits of this type will be ignored or at least heavily

discounted.

The above discussion suggests the difference in the approach taken

by economists and psychologists. Since economists take tastes as given

the preferred variables in their models are the prices of alt,..rnative goods

and services. Psychologists, however, utilize personality traits to pre-

dict choice. Both approaches could be used to examine the importance of

net benefits as a predictor of choice.

Consider two alternative possibilities for operationalizing the net

benefits approach. One involves systematic identification and cataloging

of benefits at different colleges. For example, students might be asked

to indicate the benefits they expect to receive at each of several schools.

We might then test the hypothesis that a student would choose that school

where the net benefits (total benefits less costs) were greatest.

Alternatively, statistical techniques such as regression or discrim-

inant analysis might be used to predict the weights given to each set of

benefits. Work by Holland and Nichols, and Holland has demonstrated that

the post-high school plans of seniors are related to a variety of person-

ality characteristics.6

It may be possible to link the benefit selection

procedure directly to these characteristics but further attention must

also be given to providing a set of benefits that have meaning in the

market place.

Price as a Predictor of Choice

Economic theory assumes that individuals maximize their utility

subject to an income constraint. Given a set of services available in

6

7

the economy, the rational consumer chooses among them according to the

satisfactions they yield. Eventually he arrives at an equilibrium point

by allocating his income among the services in such a way as to insure

that no other set of services leaves him better off.

We have already noted that several services are purchased by a stu-

dent seeking further education. Unless the student does not receive

satisfaction from these services his decision as to whether to attend a

college will depend upon the price of the college and on the price of

equivalent services which are available outside the college. This suggests

a demand function of the form

(1) D = f(Pc, P1,..., Pi,..., Pn)

where the demand for a college is a function of its own price (Pc) and of

the price of equivalent services elsewhere (P1 to Pn). Since equivalent

servicesmaybesuppliedatothercollegessomeoftheP:s represent prices

of other colleges. Using price theory we might then forecast that the quan-

tity of higher education demanded will increase when the price of that education

decreases or when the cost of equivalent services increases.

In a recent paper based upon data for California high school seniors,

Hoenack showed that meaningful predictive equations can be estimated for

colleges in the University of California (U.C.) system. Unfortunately, he

was unable to extend his analysis to include private and public non-U.C.

schools and thus failed to include the prices of schools outside the U.C.

system in his equations. Moreover, his formulation failed to provide a

method for predicting individual behavior but rather utilized the high

school as basis for prediction. Hoenack's method is of some interest to

8

those wishing to predict demand for state colleges but it is unsatisfactory

for describing choice among a broader range of higher education alternatives.

Operationalizing the price approach poses several problems. The

services provided by institutions are not obvious nor is it clear which

services are complements and which substitutes. While researchers can study

student choice over a period of time to determine which prices affect choice

the task will not be easy. Among Dade County students, for example, we found

that almost 82% of the students applying to one school failed to apply to a

second and 85% failed to apply to a third.7

These findings suggest that

students may first set an acceptable price and then shop for a college.

Alternatively, they may suggest that students lack information about al-

ternatives. In either event, they leave the researcher with little infor-

mation as to which prices are the relevant ones to utilize in empirical

studies.

If the prices of market-provided services are to be included in a

predictive model they must first be identified. By assuming that the value

of benefits received by a student while at college varies directly with the

prices of consumer goods, one might use the consumer price index to reflect

changes in market-provided goods. But while a general rise in consumer

prices raises the cost of obtaining similar benefits outside the college

environment the net effect of this on demand will depend upon whether

col'Bge price rises less than the general price index.

Student Preferences as a Predictor of Choice

A third method of capturing benefits utilizes student responses to

indicate whether college activities are desirable. If students desire the

8

9

current and/or future services attainable at colleges they will be more

likely to choose a college path than if they are interested only in market-

provided services. Moreover, if four-year colleges differ from junior

colleges, and if junior colleges differ from vocational schools in the

type of services they provide, then infr,,mation on the activities desired

by a student will aid in predicting his choice. While this method does

not capture all of the benefits suggested in the benefits approach, it

provides a first approximation to the effects we are after.

The difficulties of this approach are well known. Student responses

may be dependent upon the wording of the question. Moreover, students

may differ in the degree of certainty with which they hold a view. Further,

in some cases it may be desirable to work with a large number of questions

dealing with different aspects of higher education environments rather than

relying on a single measure.

In attempting to formulate the model in the next section several

alternative benefit measures were considered. Our objective was to form-

ulate a reasonable set of predictors bearing some resemblance to services

provided by higher education institutions. Several indices of current

services were considered and dropped in favor of a set of dummy variables

consisting of the student's expressed desire for: 1) Social activities

such as clubs, Greek organizations, dating and dancing; 2) Special inter-

est activities such as art, theater, journalism or photography; 3) Political

activities including both student government and/or radical. politics; 4)

Future career related activities such as membership in a business honorary

or opportunities to work with future employers; 5) College sports of either

9

10

a participatory or viewing nature; and 6) All other activities (such as

getting to know other people, living with people of the same age, etc).

In the followf..g F.,ections we report the results of an attempt to

use OLS regression to determine probabilities for the alternative educa-

tional choices. These findings should be viewed as preliminary since the

model has not been tested for interaction effects and for prediction bias.

The former problem is especially important and has been explored in a

somewhat similar context elsewhere.8

Data Description

The estimates presented in this paper are based upon data collected

from high school seniors in Dade County (Miami) Florida. A questionnaire

was administered to a random sample of high school seniors intending to

graduate in the spring of 1970. Completed questionnaires were obtained

from 18% or 2,453 of the 13,542 high school seniors enrolled in the public

school system. Initial plans called for a 20% random sample but some

attrition occurred due to absenteeism and to incomplete or improperly

completed questionnaires. The methods used for selecting the sample and

a more detailed analysis of student responses may be found in a recent

study by Grigg, et al.9

Means and standard deviations of the variables

used in the regression appear in Appendix Table 1.

The following variables were selected from among the variables

shown in Appendix Table 2 using the extra sum of squares principle.10

They explain a significant amount of the variation in at least one re-

gzession.

10

11

x1

- a dummy variable denoting that a student is male

x2 - a dummy variable denoting that a student is black

x3 a dummy variable denoting that a student is Cuban

x4

- a dummy variable denoting that a student ranks in the topquarter of his class

x5

- a dummy variable denoting that a student ranks in the secondquarter of his class

x6

- a dummy variable denoting that a student ranks in the thirdquarter of his class

x7

- a dummy variable denoting that a student's father has someeducation beyond high school

x8

- a dummy variable denoting that a student's father has acollege degree

x9

a dummy variable denoting that father's income falls between$3,000 and $4,999

x10

- a dummy variable denoting that father's income falls between$5,000 and $9,999

x11

- a dummy variable denoting that father's income falls between$10,000 and $14,999

x12

- a dummy variable denoting that father's income is $15,000or more

x13

a dummy variable denoting a major

x14

life

a dummy variable denoting a majorinterest activities

interest in college social

interest in college special

x15

- a dummy variable denoting a major interest in college politicalactivities

x16

- a dummy variable denoting a majorcareer related activities

x17

- a dummy variable denoting a major

x18a dummy variable denoting a major

activities

11

interest in college future

interest in college sports

interest in "other" college

12

x19

- a dummy variable denoting that father or mother exerted amajor influence on the student's choice

x20

- a dummy variable denoting that counselor or teacher exerteda major influence on the student's choice

x21

- a dummy variable denoting that a friend exerted a majorinfluence on the student's choice

x22

- a dummy variable denoting that a relative or some otherperson exerted a major influence on the student's choice

Several prior studies have shown the importance of sex and race in

determining college choice.11

Moreover, a large body of literature supports

the importance of high school grades.12

Where researchers Lave laid SES

measures aside, both father's income and education appear to affect college

choice.13

Although the presence of both variables might result in some

multicollinearity in the model this does not seem to be a serious problem

when viewed in terms of the zero-order correlations of the variables.14

We have utilized father's income as reported by the student. Al-

though this measure may understate "true" income a comparison of father's

income and occupation as reported by students with census estimates suggests

that the means and standard deviations of the reported income variable are

reasonable.15

Both the activities variables and the variables denoting

major influence on the student come from student responses and are subject

to the difficulties discussed in the last section. Finally, the importance

of parental and peer group influence has been documented by several re-

searchers.16

Studies using aggregated data for prediction and applied to more

detailed college classifications have found that college prices affect

student choice.17

Experiments with several functional forms and with

12

13

several formulations of the price variable did not produce a meaningful

effect when a price variable was added to the model. Since some :students

not interested in colleges may have considered a college but did not in-

dicate this fact on the questionnaire, prices could only be obtained for

those planning on attending a two or four-year college. This resulted in

a positive sign on the price coefficient in the regression predicting

choice of a four-year college. One interpretation of this sign is that

including price in the model raises R2in much the same way that placing

the dependent variable on both sides of the equation would. Another

interpretation is that the price variable acts as a proxy for college

benefits. A rise in college price indicates that expected benefits in-

crease. However, the latter hypothesis does not explain the negative sign

on the price variable in the other regressions.

An important difference between this study and several others18

is

that it estimates conditional probabilities for several post-high school

alternatives including:

no further education beyond high school

attend a business or vocational school

attend a junior college

attend a junior college then continue to a four-year school

attend a four-year college

y, =

Y2=

y3 =

y4 =

y5

=

Some students were unsure of their future plans and these students

were included in the sample used to estimate the regressions so as to

provide a 6th alternative of uncertainty. Together, the six categories

cover all alternatives available to the student.

13

14

Regression Results

For each individual in the sample and for each alternative educa-

tional choice we estimated a simple linear regression equation of the

form22

(2) y= a+ E bx+ u1=1

where y and the xihave been defined above and u is a random disturbance

term.19

Non-significant terms were then removed using addition to R2

as

a criterion for removal and the equation re-estimated. Appendix Table 3

gives zero order correlations for the variables while the regression re-

sults appear in Table 1.

The left-hand column of Table 1 lists the independent variables.

Note that "regression intercept" represents a in equation (2) and that

since the model is in dummy variable form, each of the bi coefficients

represents an addition to or subtraction from the intercept term. To

obtain the coefficients for a particular regression equation, one reads

down the appropriate column. The regression intercept for the four-year

college regression is -.008. If a student is male this adds .032 to the

intercept and if his father has a college education this raises the inter-

cept by an additional .158.

[Table 1 about here]

Note that the signs on the regression coefficients and their direction

of change support our a priori expectations. For example, consider the high

school rank variables contained in the first regression. As a student's

rank rises, the probability that he will stop his education decreases. An

increase in high school rank also lowers the probability of attendance at

14

TABLE 1

Regression Estimates of the Effects of the Independent Variables on VariousCareer Choices

Educational Choice

No FurtherVariable Education

Business orVocational

School

JuniorCollege

Junior Collegeand Second 2Year College

Four YearCollege

Regression.185 .134 .049 -.008Intercept

Male -.044 -.073 .097 .033

(.010) (.014) (.018) (.015)

Black -.068 -.090 -.077 -.011

(.015) (.020) (.025) (.022)

Cubar. -.056 .115 -.084

(.016) (.028) (.024)

High School Rank

-.066 -.080 -.027 .316Top 25%(.016) (.022) (.029) (.024)

25-50% -.046 -.002 .060 .070

(.016) (.022) (.028) (.024)

50-75% -.024 .003 .040 .006

(.015) (.021) (.027) (.023)

Father's Education

.084Some Beyond HighSchool (.019)

College Degree -.019 -.056 -.048 .158

(.013) (.015) (.017) (.010)

Father's Income

.073 -.014 .053 -.037$3,000-$4,999(.021) (.021) (.028) (.030)

$5,000-$9,999 -.006 .029 .039 -.039

(.016) (.019) (.022) (.023)

15

TABLE 1--Continued

$10,000-14,999 -.014 .017 .049 -.064(.015) (.018) (.021) (.023)

$15,000 4. -.038 -.018 .016 .040(.015) (.018) (.021) (.023)

Desired Activities

Social -.113 -.153 .172 .153

(.016) (.019) (.028) (.024)

Special Interest -.113 -.133 .189 .127

(.018) (.022) (.032) (.027)

Political Act. -.103 -.150 .118 .248

(.031) (.037) (.055) (.046)

Future Career -.086 -.134 .137 .053

(.017) (.020) (.030) (.026)

Sports -.092 -.149 .126 .130

(.014) (.017) (.026) (.021)

Other -.074 -.093 .065 .036

(.023) (.028) (.042) (.035)

Major Influenceon Student

Father or Mother -.056 .054 .098 .015

(.015) (.021) (.027) (.022)

Counselor or -.070 .060 .105 -.038

Teacher (.025) (.034) (.044) (.036)

Friend .046 .109 -.020 -.046

(.020) (.028) (.036) (.029)

Relative or -.024 .030 .078 .002

Other (.021) (.029) (.037) (.031)

Multiple CorrelationCoefficient (R2)= .12 .07 .05 .08 .25

F -Test = 16.2 17.3 7.9 12.9 35.9

Standard Error = .24 .29 .33 .43 .36

16

15

a junior college while raising the probability that a student attends a

four-year college. Thus, our results seem to be consistent. Similarly,

a low father's income raises the probability that a student stops and

lowers the probability of his going to a four-year school while a father's

income in excess of $15,000 has the reverse effect.

In general the coefficients on the high school rank and desired

activities variables are larger than those of any other variables in the

regressions where they appear. If the b coefficients are converted to beta

coefficients to provide "standardized" regression weightsithe high school

rank coefficients become larger while the desired activities terms are no

longer as important as a student's sex.20

Interestingly, desired activi-

ties have an important effect on the probability of choosing all educational

alternatives except junior colleges. This finding seems especially inter-

esting since the sign on the activities variable is negative in the first

two regressions and positive in the second two. Perhaps junior colleges

provide benefits which neither attract students interested in college

benefits nor repel them.

As expected, the regression equations differ in their ability to

explain variations in the dependent variable. The regression equation for

four-year colleges best fits the data (R2 = .25) while the worst fit

(R2= .05) occurs for the regression explaining choice of junior college.

While all of the R2's obtained seem low, this is not uncommon for regres-

sion estimates obtained from disaggregated data.

Before using the regression coefficients in Table 1 to obtain con-

ditional probabilities, we must first adjust our estimates. If no adjust-

ment is made to the summed coefficients, then the conditional probability

16

obtained may fall outside the 0-1 range normally associated with prob-

abilities. It may only take one reason to convince a student not to go

to college (i.e., low grades) so that other factors discouraging atten-

dance (i.e., such as low family income) may make the actual probability

of attending college less than zero. Thus, a less than zero estimate

occurs due to the additivity assumption made in equation (2).

[Table 2 about here]

To remove the worst effects of the additivity assumption, we utilize

a method suggested by Orcutt.21

The fitted values (y*) from the regression

are grouped into relatively homogeneous categories and a mean and standard

error of the residuals (i.e., the difference between the actual value and

the fitted value y*) are computed.22

A t-test is then used to determine

whether the residual estimate of a category differs significantly from

zero. The user sums his coefficients and looks up the mean and residual

for this value of y*. If the mean is significant, he adds it to the y*

estimate. If not, then the summed coefficients provide a correct estimate

of conditional probability. Table 2 gives adjustment factors and their

standard errors for each of the regressions presented above. An asterisk

denotes that the t-test suggested a mean significantly different from zero.

Using the regression coefficients in Table 1 and the adjustments in

Table 2, we can predict the post-high school choices of students with dif-

ferent characteristics. From among several alternative combinations of

coefficients, four situations commonly found in the sample have been chosen.

Entry I of Table 3 shows the probability that a male student in the top 25%

of his class with a father earning more than $15,000, interested in sports,

18

TABLE 2

Adjustments of the Residual Terms

No Further Education Business or Vocational School

Range of Y* Mean ofResiduals

StandardError

Range of Y* Mean ofResiduals

StandardError

Less than-.100 .116* .004 -.073 to .006 .031* .002-.100 to -.035 .068* .003 .006 to .031 .005* .002-.035 to -.003 .026* .004 .031 to .054 .002* .001-.003 to .027 -.003 .002 .054 to .069 -.032* .006.027 to .065 -.036* .008 .069 to .159 -.029* .006.065 to .100 -.037 .009 .159 to .171 -.008* .004.100 to .133 -.062 .013 .171 to .202 .005 .004.133 to .177 -.001 .001 .202 to .215 .043* .011.177 and above .061* .014 .215 and above -.006 .005

junior College Junior College Plus Second Two Years

Range of Y* Mean ofResiduals

StandardError

Range of Y* Mean ofResiduals

StandardError

-.069 to .051 .029* .002 -.055 to .118 .037* .001.051 to .081 -.004* .001 .118 to .188 -.013* .002.081 to .113 -.026* .004 .189 to .228 .014* .003.113 to .137 -.021* .005 .228 to .285 -.024* .004.137 to .158 -.020* .005 .285 to .312 -.010* .003.158 to .182 -.008* .003 .312 to .358 -.062* .001.182 to .218 .039* .008 .358 to .416 .010* .004.218 and above .012* .004 .416 and above .048* .008

Four Year College Choice

Range of Y* Mean ofResiduals

StandardError

-.177 to -.008 .071* .005

-.008 to .046 .025* .005

.046 to .103 -.011* .004

.103 to .168 -.046* .011

.168 to .244 -.052* .013

.244 to .347 -.054* .014

.347 to .469 -.002 .003

.469 to .802 .067* .017

19

17

and with father or mother influencing his choice, will choose each of the

educational alternatives. Column 1 presents probabilities for whites,

while Columns 2 and 3 give probabilities for blacks and Cubans (based

upon the regression coefficients in Table 1). Note the addition of the

"unsure" category to the set of educational alternatives. We have not

estimated a separate probability for an "unsure" answer since the category

can be obtained as one minus the sum of the other probabilities.

[Table 3 about here]

The probability that a white student will choose a junior college

rises as his father's income level and his own rank in high school fall.

This also holds true for Cubans and blacks, although the probability in-

creases more for Cubans than for other groups. Moreover, the probability

that a student will attend a four-year college decreases as his high school

rank and his father's income fall. Blacks from $15,000+ families and in

the top 25% of their class choose four-year colleges as frequently as whites.

While this also holds true at lower income levels, a greater percentage of

blacks are unsure of their plans and fewer plan to attend junior college as

compared to whites. These results appear to be consistent with an earlier

finding, using Coleman report data, that being black is not negatively

associated with college attendance but rather with completion of high

school.23

Junior colleges are favored by Cubans of almost all income levels.

One explanation of this may be a desire among Cubans to live in the Miami

area.24 While Cubans are not as likely to choose a four-year college as

whites or blacks, they are at least as likely to select a four-year

20

TABLE 3

Probability of A Male Student Continuing His Education

I. Male Student with A Father Whose Income Exceeds $15,000, in the Top 25%of His Class, with Family Influencing His Educational Choice

II

Predicted Probability

White Black Cuban

No Further Education .02 .00 .00

Business or Vocational School .02 .02 .02Junior College .05 .02 .05

Junior College Plus Second 2 Years .28 .24 .50

Four Year College .59 .58 .43Unsure .04 .14 .00

Male Student with A Father Whose Income Falls Between $10,000-14,9991, inthe Top 25% of His Class with Family Influencing His Choice

No Further Education .00 .00 .00

Business or Vocational School .05 .05 .05

Junior College .06 .02 .06

Junior College Plus Second 2 Years .28 .24 .51

Four Year College .42 .41 .28

Unsure .19 .28 .10

III. Male Student with A Father Whose Income Falls Between $5,000-9,999, inthe Second Quarter of His Class with Family Influencing His Choice

No Further Education .06 .00 .01

Business or Vocational School .03 .03 .03

Junior College .13 .06 .13

Junior College Plus Second 2 Years .49 .29 .59

Four Year College .15 .14 .07

Unsure .14 .48 .17

IV. Mole Student with A Father Whose Income Falls Between $3,000-4,999, inthe Third Quarter of His Class, Influenced bA Friend

No Further Education .43 .38 .36

Business or Vocational School .02 .02 .01

Junior College .21 .12 .19

Junior College Plus Second 2 Years .28 .23 .38

Four Year College .06 .06 .06

Unsure .00 .19 .00

Table Notes: We use a .00 to denote a probability of zero. Although the

groupings used in the Orcutt adjustment should be based upon homogeneous obser-vations, this is not completely possible. Thus, the possibility exists thatnegative probabilities may occur at group boundary points. Slightly negative

probabilities are found in the "no further education" category but the differ-ences are so small that no reordering of the groupings was undertaken.

21

18

education by utilizing the junior college plus two additional years of

schooling.

While the models presented above seem to capture the factors influ-

encing why students continue their education they add little to our knowl-

edge of why students do not continue their education. The intercept term

on the first regression indicates a 25% probability that students will not

continue. This probability is unrelated to the variables in the model and

suggests that it would be fruitful to learn more about the causes of non-

continuation. Moreover, none of the significant variables except low

father's income and friend's influence contributed positively to the

probability of non-continuation while almost all contributed negatively.

More work must be done to identify positive influences on non-completion.

Our results are preliminary. The probabilities calculated from the

above regressions should be tested against the probabilities observed in

the cells of the sample. Moreover, our models should be tested on a

broader sample of students.

A rough comparison with Schoenfeldt's study of Project Talent data

suggests that our estimates seem reasonable.25

Although Schoenfeldt used

an ability measure based upon 16 aptitude and ability scores and a socio-

economic status (SES) measure obtained from nine different items, his

results seem to be similar to ours. For example, 85% of those in his top

ability and SES quartiles planned to attend a four-year college.26

Adding

our estimate of the probability of attending junior college and continuing

for a second two years to the probability of attending a four-year college

in entry I of Table 3 gives us an equivalent estimate of 86%. Similarly,

22

19

Schoenfeldt estimates that moving from the top ability quartile to the

second quartile decreases the probability of four-year college entry by

24%. This appears to be consistent with our estimated decrease from 32%

to 07%, obtained from the last column of Table 1 by moving from the top

25% rank in high school to the 25-50% rank. Our estimates for college

attendance for the lower income groups appear to be higher than Schoenfeldt's

bit this may be due to the open door policies followed by the Miami-Dade

Junior College and to our separation of ethnic groups. On balance, the

estimates presented here appear to be plausible when compared to those

obtained from Project Talent data.

An Application of Conditional Probabilities tothe Problem of Predicting Future Incomes

Economists have increasingly recognized that the supply of human

capital (the stock of skills and knowledge of an individual) has an important

effect on the distribution of income. Studies of the process of human capi-

tal accumulation have identified two important effects of student background

characteristics. The first involves the effect of socio-economic and other

background characteristics in determining the educational level.27

For

example, low family incomes may cause students to drop out of high school.

The second involves the extent to which the returns to graduates from in-

stitutions of higher education depend upon their backgrounds, abilities,

and other factors.28 For example, high ability students may earn more as

a result of their education than average students. Since inequalities of

income stem from both of these sources, it will be useful to describe this

process in a simple model.

23

20

Let Eidenote the expected lifetime income of an individual with a

set of i characteristics. Let pij represent the estimated probability that

an individual with characteristics (i) will complete a particular educa-

tional path (j) and Eij

represent the expected lifetime income for an

individual with characteristics (i) completing path (j). Now plj consists

of two other probabilities; yli the conditional probability of choosing an

educational path as estimated in the last section, and dij the probability

that an individual will complete this path so that pli = ylidli. For sim-

plicity, let dij = 1 for all j so that we may estimate the lifetime earnings

of an individual as follows:

(3) ii= Ey' E

ij ij

The probabilities estimated earlier thus enable us to estimate the

expected future income of high school seniors with different characteristics.

Utilizing the probabilities calculated for the entries in Table 3 and an

estimate of expected lifetime incomes we have calculated expected lifetime

incomes for the students in our sample. Table 4 gives the results of our

calculations.

[Table 4 about here]

In order to estimate expected lifetime incomes it was necessary to

obtain information on cross-section incomes for individuals with different

educational backgrounds. A recent Census Bureau publication presents life-

time income estimates for those aged 18 in 1968.29

These estimates assume

constant 1968 dollars and allow for adjustments for productivity and the

probability of survival.30

We have chosen to use age 18 as the basis for

our estimates since this corresponds to the age of most high school graduates.

24

TABLE 4

Expected Discounted Future Income for High School Graduates

Whites Cubans BlacksI II

Category I $285,450 $291,470 $279,840 $181,900

Category II $269,310 $278,310 $263,470 $171,260

Category III $264,360 $266,360 $241,890 $157,230

Category IV $235,830 $245,200 $228,940 $148,810

Category I denotes a student in the top 25% of his class, influencedby his family, interested in sports, whose father's income is $15,000 ormore.

Category II denotes a student in the top 25% of his class, influ-enced by his family, interested in sports, whose father's income is$10,000 to $14,999.

Category III denotes a student in the second quarter of his class,influenced by his family, interested in sports, whose father's income is$5,000 to $9,999.

Category IV denotes a student in the third quarter of his class,influenced by his family, interested in sports, whose father's incomeis $3,000 to $4,999.

Table Notes: The data for blacks appears two ways: I assuming noincome differential and II assuming a non-white/white differential of 65%.We have assumed that all students unsure of their plans do not continuebeyond high school.

25

21

The income estimates assume a productivity increase of 3% per year and a 5%

discount rate. No allowance is made for price changes and/or incomes re-

ceived after age 64. Although we do not include the direct costs of college

attendance, the estimate of mean income includes all income recipients.

Since most college students have incomes roughly equal to 25% of those not

in college and since years of college attendance are included in the income

figures, our estimates implicitly allow for opportunity costs foregone.31

Unfortunately, the 1968 census data could not be disaggregated to

provide an Eij

for each characteristic used in the probability model so

that mean earnings for each educational level (E j) were substituted in the

calculation. This did not seem to be a reasonable procedure when dealing

with black incomes so we have attempted to adjust black incomes downward

to reflect prevailing discrimination differentials. A reasonable estimate

of the white-black income ratio is 65% based upon prior studies and we have

used this estimate in column 4 of Table 4.32

Several other assumptions should also be noted. The Census Bureau

does not provide separate estimates for graduates of business and vocational

schools. Nor does it report on the income of junior college students. In-

come estimates for both groups are based upon the 1-3 years of college

census grouping. A second problem involved our own data. The probability

estimates in Table 3 include the responses of students unsure of their

future plans. In calculating expected incomes a decision had to be made

with respect to the estimated income of this group. We have assumed that

students unsure of their plans do not continue their education. While this

creates a downward bias in the income estimates for lower income groups it

26

22

should also be noted that our estimates of expected incomes of upper income

groups are also biased downward since no allowance is made for different

incomes from post-college training.

In general, our estimates suggest that high school graduates from

upper income homes have higher expected incomes than those from lower in-

come homes. On the average, the expected income of a student in Category

IV tends to be about 83% of the income of a student from Category I and

this differential would be even greater if incomes were adjusted for re-

turns to ability.

Our estimates for the ethnic groups seem reasonable. Blacks have

lower expected incomes (by about 2%) than whites even before adjusting for

non-white income differentials but receive significantly lower incomes after

adjustment for discrimination. Although there has been some lessening of

discrimination in recent years, the non-white/white ratio may start to grow

again unless the economy moves in the direction of full employment. The

Cuban income estimates seem high at first glance although they are reason-

able if one realizes that the Cubans in the Miami area are in the middle

class and place great value on education.

Conclusion

OLS estimation of conditional probabilities of post-high school choice

has several advantages. The technique is tried and true and its properties

have been studied and documented. The researcher can use easily identifi-

able variables and, after suitable adjustment, determine probabilities

directly from the regression coefficients. Moreover, the need for an

27

23

a priori specification of the model forces the researcher to give some

thought to the appropriate functional form of the model.

Unfortunately, the OLS approach breaks down as the number of indepen-

dent variables entered into the model increases since this inevitably leads

to the mi'lticollinearity problem. But as Glauber and Farrar point out

"(s)uccessful forecasts with multicollinear variables require not only the

perpetuation of a stable dependency relationship between y and X (a matrix),

but also the perpetuation of stable interdependency relationships within

uX.

33Since these conditions are met only in a context where the forecast-

ing problem is trivial, this leaves several alternatives. The prediction

model can be scaled down by discarding some of the prior theoretical in-

formation brought to the problem. This may involve a cost in terms of the

predictive accuracy of the model. Alternatively, the researcher may reduce

his data into a set of significant orthogonal common factors and use these

for prediction. Where these factors can be directly identified with mean-

ingful characteristics, this technique can provide an alternative to the one

considered above. More often each factor turns out to be an artificial

combination of the original variables which is difficult to identify.34

Thus, while the use of factor analysis regression avoids the charge that

too much importance is given to the effects of a single variable, it sub-

stitutes an artificial variable which may be of limited usefulness for

practical applications.

In general the results obtained above using OLS techniques are en-

couraging. While the R2

in our models are low, the signs on the coeffi-

cients meet our expectations and suggest the importance of including

28

24

indicators of college benefits in predictive models. Although the perfor-

mance of several price variables tested in the model is disappointing,

this does not rule out the possibility that once the benefits desired by

students are more precisely captured in the model, a price variable will

enter with a negative sign. These results suggest the need for further

study of post high school choice; especially of those students choosing

to continue their education but not at four-year schools.

29

FOOTNOTES

1. H. A. Simon, "Causal Ordering and Identifiability" in Studies inEconometric Method, ed. by W. Mood and T. Koopmans (New Haven: YaleUniversity Press. Cowles Monograph 14), p. 50.

2. We use the term "student" to refer to all of those involved in thecollege decision. In fact, parents often play a major role in determininga student's post high school choice and this fact will be introduced intoour later model.

3. For example, the Census Bureau frequently uses the undiscounteddollar value of a college education whereas economists prefer to speak ofthe present value of future earnings. For a simple explanation of the dif-ference in the two concepts, see D. Witmer, "Economic Benefits of CollegeEducation" in Review of Educational Research, October, 1970.

4. B. Weisbrod, External Benefits of Public Education (New Jersey:Princeton University Press, 1964), p. 19. I am grateful to Weisbrod forsuggesting several of these points.

5. A. Astin, The College Environment (Washington: American Councilon Education, 1967).

6. J. L. Holland, "Explorations of a Theory ofAchievement II. A Four Year Prediction of First YearHigh Aptitude Students," Psychological Monographs, NoAmerican Psychological Association).

7. See C. M. Grigg, W. S. Ford, H. Tuckman, D.a Second Two Year University, A Report to the Floridasity, December, 1970.

Vocational Choice andCollege Performance of

. 570. (Washington:

Muse, The Demand forInternational Univer-

8. H. Tuckman, High School Inputs and Their Contribution to SchoolPerformance (mimeo), Spring, 1970.

9. Grigg, et al., 1012.. Cit., Chap. 2.

10. See N. Draper and H. Smith, Applied Regression Analysis (New York:John Wiley, 1966), pp.. 67-69.

11. John Conlisk, "Determinants of School Enrollment and SchoolPerformance," Journal of Human Resources, Spring, 1969.

12. See, for example, B. Bloom and F. Peters, The Use of AcademicPrediction Scales for Counseling and Selecting College Entrants (New York:

Free Press of Glencoe, 1961).

13. Grigg, 02. Cit., Chap. 4.

30

14. See Appendix Table 3.

15. Grigg, Op. Cit., Chap. 2.

16. See, for example, Grigg, Chap. 4.

17. S. Hoenack, Private Demand for H4ler Education in California(unpublished dissertation, University of California, Berkeley, 1969).

18. See S. Hoenack, Op. Cit., Chap. 3, 4; W. Sewell and V. Shah,"Socioeconomic Status, Intelligence, and the Attainment of Higher Education,"Sociology of Education, Fall 1967; and R. Radner and,. Miller, "Demand andSupply In U. S. Higher Education: A Progress Report;American EconomicReview, May 1970.

19. The use of a dichotomous variable as the dependent variableviolates the assumption of homoscedasticity of the OLS model. This meansour estimates are not as efficient as those obtained when all assumptionsare met. Given time constraints and other limitations in the model, theadditional effort required for more 1:efined techniques was not thought tobe justified.

X20. Beta coefficients can be computed from the formula B = b

i s

i

ywhere sxi is the standard deviation of the x

iparameter and sy represents

the standard iation for the dependent variable.

21. Guy Orcutt, Microanalysis of Socioeconomic Systems: A SimulationStudy (New York: Harper, 1961).

n

22. More formally, for each category we have T1i=1

(y y) = e.

Unfortunately, the choice of these groupings is arbitrary since ()routesonly restriction is that they be homogeneous. As a result, the conditionalprobabilities obtained by the researcher are not invariatecto the residualgroups chosen.

23. See. H. Tuckman, OR. Cit.

24. Grigg, et al, Olt. Cit., especially Chap. 5.

25. L. Schoenfeldt, "Education After High School," Sociology ofEducation, Fall 1968.

26. Ibid., p. 359.

27. Examples of this approach are found in Conlisk, Op. Cit.,Tuckman, 212.. Cit., and the model presented above.

31

28. See, for example, W. Hansen, B. Weisbrod, and W. S. Scanlon,"Schooling and Earnings of Low Achievers," American Economics Review,June, 1970 and J. Gwartney, "Changes in the Non-White/White IncomeRatio 1939-67," American Economic Review, December, 1970.

29. U. S. Department of Commerce, Bureau of the Census, "AnnualMean Income, Lifetime Income, and Educational Attainment of Men in theUnited States For Selected Years, 1956 to 1968," Current PopulationReports, Series P-60, No. 74, October 30, 1970.

30. The formula used to ca'culate the estimates is

V18

64

E YnPn (1+x)N-18+1/2

N=18

(1+11)N-18+1

Where V stands for the presented value of income received from age 18to 64, Y

nindicates average income at age N, P

ngives the relative

number of survivors at age N of those alive at age 18 (based on U. S.life tables), x represents a 3% productivity increase, and R stands fora 5% discount rate. For further information, see "Annual Mean Income...,"pp. 18-19.

31. The present value of a college graduate's income is $297,000at age 18 and $300,000 at age 22. The difference between the two es-timates reflects the effects of beginning the discounting at a timewhen incomes are low and applying larger discount rates in periods whenincomes rise.

32. Gwartney, OIL. Cit., p. 874 and D. O'Neill, "The Effect ofDiscrimination on Earnings: Evidence from Military Test Score Results,"Journal of Human Resources, Fall, 1970, especially p. 482-484. Black

incomes tend to be less than those of whites in later years but sincethese years are heavily discounted this difference should not affectour estimates.

33. D. Farrar & R. Glauber, "Multicollinearity In RegressionAnalysis: The Problem Revisited," Review of Economics and Statistics,February, 1967.

34. Ibid, p. 97.

32

APPENDIX TABLE 1

Means and Standard Deviations of Variables Used In the EducaLional Choice Models

Variable Name Mean Standard Deviation

Male (x1) .699 .500

Black (x2) .153 .360

Cuban (x3) .112 .315

Class Rank

Top 25% of Class (x4) .266 .442

25-50% of Class (x5) .256 .436

50-75% of Class (x6) .319 .466

Education of Father

Some College Education (x7) .2G8 .406

College Graduate (x8) .206 .405

Family Income

.084

.216

.277

.411

$3,000-4,999 (x9)

$5,000 -9,999(x10)

510,000-14,999(x11)

.247 .4.52

Above $15,000(x12)

.259 .438

Activity Desired by Student

Social(x13)

.128 .335

Special Interest (x14) .088 .283

Political Activity(x15)

.027 .161

Future Career(x16)

.101 .301

Sports (x17) .172 .378

Other College Related(x18)

.049 .216

Major Influence on the Student

Fischer or Mother (x19

) .612 .487

Counselor or reacher (x20

) .057 .233

Friend (x21 ) ,105 .307

Relative or Other )

-99 .092 .290

33

APPENDIH TAB-1.1: 2

Original. Set of Variables

1.

2.

3.

4.

5.

Yale

Blac

C±27

Too C;rter of Class

Secca 7uarter of Class

22.

23.

24.

Future Career Activities

Sorts Activities

Other Activities

Father or Mother is MajorInfluence

Counselor or Teacher is Major6. Third Tharter of Class Influence

7. Father has s)me College 27. Friend is Plajor InfluenceIdl:cation

23. Relative or Other is Major8. Father is College Graduate Influence

9. Father's Income is $3,000-4,999 29. Value of Education is Financial

10. Father's Income is $5,000-9,999 30, Value of Education is Knol.:ledge

11. Father's Income is $10,000-14,999 31. Value of Education is Civic

12. Father's Income is $15,000+ 31.. Value of Education is Other

13. Social Science Major 33. Tuition of First College Considered

14. Fia,, Arts Major 34. Tuition Plus Fees of First CollegeConsidered

15. Science Major35. Tuition of Second College Considered

16. Education Major36. Tuition Plus Fees of Second College

17. Business Major Considered

18, Other Major 37. Tuition times x9

19. Social Activities "AR Tuition times x10

20. Special Interest Activities 39. Tuition times x11

21. Political Activities 40. Tuition times x

34

APPENDIX TABLE 3

Matrix of Zero Order Correlations

xlx'

)x3

X4

X5

x6

X7

X8

X9

x10

`ill

x12

x13

x14

x15

x16

x17

x18

x19

x20

x21

x22

xl

1.00

-.05

-.00

-.02

.04

-.01

.00

.03

.01

-.03

.10

.04

-.07

-.07

.00

.00

.23

-.01

.03

-.04

-.08

-.03

91.00

-.15

-.12

.01

.01

-.15 -.16

.15

.02

-.01

-.17

-.01

.01

-.02

.00

.01

-.07

.04

.05

-.05

.00

x3

1.00

-.07

.02

.05-.01 -.01

.14

.17

-.08

-.13

-.03

.02

-.00

.00

.01

-.01

.05

.01

-.03

-.05

41.00-.35

-.41

.08

.12-.06

-.02

.04

.07

.09

.08

.08

.02

.05

.05

.08

.02

-.05

-.00

x5

1.00

-.40

.00

.00

.01

.01

.01

.00

.02

-.02

.04

.00

.03

.02

.01

.02

.01

-.01

1.00

.CO -.08

.02

.02

.02-.03

.01-.06

-.05

-.01

-.03

-.03

-.02

-.05

.02

.04

x.

1.00 -.26

-.05

-.04

.10

.07

.03

.02

-.00

.05

.01

.01

.05-.01

.00

-.03

C-11

x8

1.00

-.10

-.08

-.07

.25

.04

.04

.07

-.00

.06-.01

.09

-.07

-.00

-.00

x9

1.00

-.16

-.17

-.18

-.05

-.00

-.02

.01

.01

-.06

.01

.02

.03

.04

x10

1.00

-.30

-.31

.02

.01

-.03

-.03

.03

.01

-.01

.08

.02

-.00

x11

1.00

-.34

.00

-.04

-.00

.02

.04

.03

.02

-.01

.00

.02

x12

1.00

.07

.03

.04

-.00

-.01

.02

.06

-.06

.01

-.03

x13

1.00

-.12

-.06

-.13

-.18

-.09

.06

-.00

-.01

-.01

14

1.00

-.05

-.10

-.14

-.07

.03

.02

.02

-.03

x15

1.00

-.06

-.08

-.04

.01

-.01

-.01

.04

x16

1.00

-.L5

-.03

.03

.01

-.06

-.04

x17

1.00

-.10

.12

.01

-.U5

-.04

x18

1.00

.01

.02

-.01

.01

x19

1.00

-.31

-.43

-.40

x90

1.00

-.0")

x71

1.00

-.11