Dummy Dependent Variables Models

DUMMY VARIABLES – INDEPENDENT & DEPENDENT DUMMY VARIABLES

Dummy variables are independent variables which take the value of either 0 or 1. Just as a "dummy"

is a stand-in for a real person, in quantitative analysis, a dummy variable is a numeric stand-in for a

qualitative fact or a logical proposition. For example, a model to estimate demand for electricity in a

geographical area might include the average temperature, the average number of daylight hours, the

total number of structure square feet, numbers of businesses, numbers of residences, and so forth. It

might be more useful, however, if the model could produce appropriate results for each month or

each season. Using the number of the month, such as 12 for December, would be silly, because that

implies that the demand for electricity is going to be very different between December and January,

which is month 1. It also implies that Winter occurs during the same months everywhere, which

would preclude the use of the model for the opposite polar hemisphere. Thus, another way to

represent qualitative concepts such as season, male or female, smoker or non-smoker, etc., is

required for many models to make sense.

In a regression model, a dummy variable with a value of 0 will cause its coefficient to disappear from

the equation. Conversely, the value of 1 causes the coefficient to function as a supplemental

intercept, because of the identity property of multiplication by 1. This type of specification in a linear

regression model is useful to define subsets of observations that have different intercepts and/or

slopes without the creation of separate models. In logistic regression models, encoding all of the

independent variables as dummy variables allows easy interpretation and calculation of the odds

ratios, and increases the stability and significance of the coefficients. Examples of these results are in

Section 3. In addition to the direct benefits to statistical analysis, representing information in the

form of dummy variables is makes it easier to turn the model into a decision tool. Consider a risk

manager who needs to assign credit limits to businesses. The age of the business is almost always

significant in assessing risk. If the risk manager has to assign a different credit limit for each year in

business, it becomes extremely complicated and difficult to use because some businesses are several

hundred years old. Bivariate analysis of the relationship between age of business and default usually

yields a small number of groups that are far more statistically significant than each year evaluated

separately.

In regression analysis, a dummy variable (also known as indicator variable or just dummy) is one

that takes the values 0 or 1 to indicate the absence or presence of some categorical effect that may

be expected to shift the outcome. For example, in econometric time series analysis, dummy variables

may be used to indicate the occurrence of wars, or major strikes. It could thus be thought of as a

truth value represented as a numerical value 0 or 1 (as is sometimes done in computer

programming).

1

The addition of dummy variables always increases model fit (coefficient of determination), but at a

cost of fewer degrees of freedom and loss of generality of the model. Too many dummy variables

result in a model that does not provide any general conclusions.

Dummy variables may be extended to more complex cases. For example, seasonal effects may be

captured by creating dummy variables for each of the seasons: D1=1 if the observation is for

summer, and equals zero otherwise; D2=1 if and only if autumn, otherwise equals zero; D3=1 if and

only if winter, otherwise equals zero; and D4=1 if and only if spring, otherwise equals zero. In the

panel data fixed effects estimator dummies are created for each of the units in cross-sectional data

(e.g. firms or countries) or periods in a pooled time-series. However in such regressions either the

constant term has to be removed or one of the dummies removed making this the base category

against which the others are assessed, for the following reason:

If dummy variables for all categories were included, their sum would equal 1 for all observations,

which is identical to and hence perfectly correlated with the vector-of-ones variable whose

coefficient is the constant term; if the vector-of-ones variable were also present, this would result in

perfect multicollinearity, so that the matrix inversion in the estimation algorithm would be

impossible. This is referred to as the dummy variable trap.

Describing qualitative data

Far from all of the data of interest to econometricians is quantitative. For instance, gender of

individuals, whether they are married, the industry of firms, countries or regions are all

considered to be qualitative. To include them in regression, we go for dummy variable. In

many cases, the information can be described as being true or false or the character present

or absent. In those cases, it is easy to set up a binary variable or dummy variable taking

values 0 and 1.

For instance, male is usually set to 1 when the individual is male and 0 when female, while if

rather we define female we would likely do the opposite. Those are clearer than a gender

variable. It does not matter to the result, but it does to their interpretation!

Describing categories or ranges

Dummy variables are also useful to describe categories. Indeed, even if the variable is not

binary, if it takes a finite number of values then it can be described by a complete set of

dummy variables For instance, if eyes colour can be brown, blue, green or red, we can have

four dummy variables for each of these colour, taking 1 whenever an individual has eyes of

this colour. More complex for Bowie. Notice that summing all variables in a complete set

2

should give you 1 for all observations. This technique can also be useful for quantitative data

which you do not believe should be considered as one continuous variable. Dummy variables

are 'discrete' and 'qualitative' (e.g., male or female, in the labour force or not, working under

a collective or individual employment contract, renting or owning your home). Units of

measurement are ‘meaningless’. Normally 1 is assigned to the presence of some

characteristic or attribute; 0 for the absence of that characteristic or attribute.

A dummy variable for several ranges allows you to distinguish the effects of what you might

see as “thresholds”.

Example1: in the Mincer equation, we often use dummy variables for high school dropouts,

high school graduates, etc.

EXAMPLE2: A regression model of labour market discrimination by gender.

where Yi = annual earnings

Si = years of education.

Gi = 1 if ith person is a male

0 if ith person is a female.

No special estimation issues as long as the regression meets the all the classical assumptions.

Only the nature of the independent variables has changed.

The expected salary of a female is:

The expected salary of a male is:

Since E( | Si, Gi)=0. Testing for discrimination (i.e., H0: β2=0) is a test for a difference in the intercept terms.

Intercept shift

3

Dummy Variable Trap: Suppose we estimate the following:

where Fi = 1 if ith person is female

0 if ith person is male

Mi = 1 if ith person is male

0 if ith person is female

This is known as the 'Dummy Variable Trap'. We're including redundant information in the

regression. Suppose the sample looks like this:

4

Men: wage = (β0 + β2) β1 Si

Women: wage = β0 + β1 Si

Slope = β2

β0+ β1

Constant Fi Mi

1 1 0

1 0 1

1 1 0

1 0 1

1 1 0

1 1 0

1 0 1

The problem is that the two dummies are a linear function of the constant (i.e., Fi+Mi = 1).

Perfect multicollinearity. Violates Assumption (6). Estimated coefficients and their standard

errors can’t be computed.

The solution is simple -- drop a dummy variable or the constant term.

Rule of Thumb: If you have 'm' categories, then use 'm-1' dummies.

Slope dummy variables: We could allow for differences in these returns by adding an

'interacted' variable:

This is a more 'flexible' specification.

The expected salary of female is:

The expected salary of male is:

We now have both a 'composite' intercept term and slope coefficient for male.

If β2>0, then male regression line has a higher intercept.

Using a set of dummy variables

What happens if we use a complete set of dummy variables?

5

The four dummies sum to one, hence we have perfect co linearity. The regression will not be

able to identify properly the coefficients. It is as if we had a single variable always equal to

one (like for the intercept). One possible way out is then to drop the intercept. Each dummy

coefficient will then be interpreted as the intercept for this specific group.

Another (more common) possibility is to drop one variable in the set. This will be the

baseline and the other dummy coefficients will read directly as the difference from this

baseline.

Example from Alesina, Algan, Cahuc and Giuliano (2009)

Dummy variables in R

By default, R will automatically remove the last dummy variable if you provide a complete

set. However, you are well-advised to do it yourself as this will help with the interpretation,

and also because other software may not be as kind. There are many methods to create

dummy variables from qualitative data.

Fixed effects

Dummy variables are also frequently used as fixed effects. Typically, we might add time-fixed

effect to our regression to capture structural changes underlying our regression. For

instance, this could be a dummy variable for each year or each period (minus one).

In many cases, it is also useful to define a set of individual-fixed effects to capture all

unobserved individual characteristics. This might lead to a potentially large number of

dummy variables, which is usually not a problem with modern computers. However, you

must have several observations for each individual or you will not have degrees of freedom!

6

Dummy Dependent Variables Models

In this chapter we introduce models that are designed to deal with situations in which our

dependent variable is a dummy variable. That is, it assumes either the value 0 or the value 1.

Such models are very useful in that they allow us to address questions for which there is a

“yes or no” answer.

1. Linear Probability Model

In the case of dummy dependent variable model we have:

where or 1 and .

What would happen if we simply estimated the slope coefficients of this model using

OLS? What would the coefficients mean? Would they be unbiased? Are they efficient?

A regression model in the situation where the dependent variable takes on the two

values 0 or 1 is called a linear probability model. To see its properties note the following.

a) Since the mean error is zero, we know that .

b) Now, if we define and , then

. Therefore, our model is and the estimated

slope coefficients would tell us the impact of a unit change in that explanatory

variable on the probability that

c) The predicted values from the regression model would provide

predictions, based on some chosen values for the explanatory variables, for the

probability that . There is, however, nothing in the estimation strategy that

would constrain the resulting predictions from being negative or larger than 1-

clearly an unfortunate characteristic of the approach.

d) Since and uncorrelated with the explanatory variables (by assumption), it is

easy to show that the OLS estimators are unbiased. The errors, however, are

heteroscedastic. A simple way to see this is to consider an example. Suppose that the

dependent variable takes the value 1 if the individual buys a Rolex watch and 0 other

wise. Also, suppose the explanatory variable is income. For low level of income it is

likely that all of the observations are zeros. In this case, there would be no scatter

around the line. For higher levels of income there would be some zeros and some

ones. That is, there would be some scatter around the line. Thus, the errors would be

heteroscedastic. This suggests two empirical strategies. First, we know that the OLS

7

estimators are unbiased but would yield the incorrect standard errors. We might

simply use OLS and then use the White correction to produce correct standard

errors.

2. Logit and Probit Models

One potential criticism of the linear probability model (beyond those mentioned

above) is that the model assumes that the probability that is linearly related to the

explanatory variable(s). We might, however, expect the relation to be nonlinear. For

example, increasing the income of the very poor or the very rich will probably have little

effect on whether they buy an automobile. It could, however, have a nonzero effect on other

income groups.

Two models that are nonlinear, yet provide predicted probabilities between 0 and 1,

are the logit and probit models. The difference between the linear probability model and the

nonlinear logit and probit models can be explained using an example. To motivate these

models, suppose that our underlying dummy dependent variable depends on an unobserved

(“latent”) utility index . For example, if the variable y is discrete, taking on the values 0

and 1 if someone buys a car, then we can imagine a continuous variable that reflects a

person’s desire to buy the car. It seems reasonable that would vary continuously with

some explanatory variable like income. More formally, suppose

and

(i.e., the utility index is “high enough”)

(i.e., the utility index is not “high enough”)

Then:

Given this, our basic problem is selecting F – the cumulative density function for the error

term. It is here where the logit and probit models differ. As a practical matter, we are likely

interested in estimating the ’s in the model. This is typically done using a Maximum

8

Likelihood Estimator (MLE). To outline the MLE in this context, recognize that each

outcome has the density function . That is, each takes on either

the value of 0 or 1 with probability and . Then the likelihood function

is:

and

which, given , becomes

Analytically, the next step would be to take the partial derivatives of the likelihood function

with respect to the ’s, set them equal to zero, and solve for the MLEs. This could be a very

messy calculation depending on the functional form of F. In practice, the computer will

solve this problem for us.

2.1. Logit Model

For the logit model we specify

It can be seen that .

Similarly, . Thus, unlike the linear probability model,

probabilities from the logit will be between 0 and 1. A complication arises in interpreting the

estimated ’s. In the case of a linear probability model, a b measures the ceteris paribus

effect of a change in the explanatory variable on the probability y equals 1. In the logit model

we can see that

9

Notice that the derivative is nonlinear and depends on the value of x. It is common to

evaluate the derivative at the mean of x so that a single derivative can be presented.

Odds Ratio

For ease of exposition, we write above equation as where . To

avoid the possibility that the predicted values might be outside the probability interval of 0

to 1, we model the ratio . This ratio is the likelihood, or odds, of obtaining a successful

outcome (the ration of the probability that a family will own a car to the probability that it

will not own a car)1.

If we take the natural log of above equation, we obtain

that is, L, the log of the odds ration, is not only linear in x, but also linear in the parameters. L

is called the logit, and hence the name logit model.

Logit model cannot be estimated using OLS. Instead, we use MLE that discussed

previous section, an iterative estimation technique that is especially useful for equations

that are nonlinear in the coefficients. MLE is inherently different from least squares in that it

chooses coefficient estimates that maximize the likelihood of the sample data set being

observed. Interestingly, OLS and MLE are not necessarily different; for a linear equation that

meets the classical assumptions (including the normality assumption), MLE are identical to

the OLS.

Once the logit has been estimated, hypothesis testing and econometric analysis can

be undertaken in much the same way as for linear equations. When interpreting coefficients,

1

10

however, be careful to recall that they represent the impact of a one unit increase in the

independent variable in question, holding the other explanatory variables constant, on the

log of the odds of a given choice, not on the probability itself. But we can always compute

the probability as certain level of variable in question.

2.2. Probit Model

In the case of the probit model, we assume that the . That is, we assume the

error in the utility index model is normally distributed. In this case,

where F is the standard normal cumulative density function. That is

In practice, the c.d.f. of the logit and the probit look quite similar to one another. Once

again, calculating the derivative is moderately complicated . In this case,

where f is the density function of the normal distribution. As in the logit case, the derivative

is nonlinear and is often evaluated at the mean of the explanatory variables. In the case of

dummy explanatory variables, it is common to estimate the derivative as the probability

when the dummy variable is 1 (other variables set to their mean) minus the probability

when the dummy variable is 0 (other variables set to their mean). That is, you simply

calculate how the predicted probability changes when the dummy variable of interest

switches from 0 to 1.

Which Is Better? Logit or Probit

Fortunately, from an empirical standpoint, logits and probits typically yield very

similar estimates of the relevant derivatives. This is because the cumulative distribution

functions for the logit and probit are similar, differing slightly only in the tails of their

respective distributions. Thus, the derivatives are different only if there are enough

observations in the tail of the distribution. While the derivatives are usually similar, it is

important to remember the parameter estimates associated with logit and probit models are

not. A simple approximation suggests that multiplying the logit estimates by 0.625 makes

the logit estimates comparable to the probit estimates.11

Example: We estimate the relationship between the openness of a country Y and a country’s

per capita income in dollars X in 1992. We hypothesize that higher per capita income should

be associated with free trade, and test this at the 5% significance level. The variable Y takes

the value of 1 for free trade, 0 otherwise.

Since the dependent variable is a binary variable, we set up the index function

If (open); if (not open)

Probit estimation gives the following results:

Dependent Variable: Y

Method: ML - Binary Probit (Quadratic hill climbing)

Date: 05/27/04 Time: 13:54

Sample(adjusted): 1 20

Included observations: 20 after adjusting endpoints

Convergence achieved after 7 iterations

Covariance matrix computed using second derivatives

Variable Coefficien

t

Std. Error z-Statistic Prob.

C -1.994184 0.824708 -2.418048 0.0156

X 0.001003 0.000471 2.129488 0.0332

Mean dependent

var

0.500000 S.D. dependent var 0.512989

S.E. of regression 0.337280 Akaike info criterion 0.886471

Sum squared resid 2.047636 Schwarz criterion 0.986045

Log likelihood -6.864713 Hannan-Quinn criter. 0.905909

Restr. log likelihood -13.86294 Avg. log likelihood -

0.343236

LR statistic (1 df) 13.99646 McFadden R-squared 0.504816

Probability(LR stat) 0.000183

Slope is significant at the 5% level.

The interpretation of the changes in a probit model. is the effect of X on . The

marginal effect of X on is easier to interpret and is given by .

12

To test the fit of the model (analogous to R-squared), the maximized log-likelihood value

(lnL) can be compared to the maximized log likelihood in a model with only a constant

in the likelihood ratio index

Logit estimation gives the following results:

Dependent Variable: Y

Method: ML - Binary Logit (Quadratic hill climbing)

Date: 05/27/04 Time: 14:12

Sample(adjusted): 1 20

Included observations: 20 after adjusting endpoints

Convergence achieved after 7 iterations

Covariance matrix computed using second derivatives

Variable Coefficien

t

Std. Error z-Statistic Prob.

C -3.604995 1.681068 -2.144467 0.0320

X 0.001796 0.000900 1.995415 0.0460

Mean dependent

var

0.500000 S.D. dependent var 0.512989

S.E. of regression 0.333745 Akaike info criterion 0.876647

Sum squared resid 2.004939 Schwarz criterion 0.976220

Log likelihood -6.766465 Hannan-Quinn criter. 0.896084

Restr. log likelihood -13.86294 Avg. log likelihood -

0.338323

LR statistic (1 df) 14.19296 McFadden R-squared 0.511903

Probability(LR stat) 0.000165

As you can see from the output, the slop coefficient is significant at the 5% level.

13

The coefficients are proportionally higher in absolute value than in the probit model, but the

marginal effects and significance should be similar.

This can be interpreted as the marginal effect of GDP per capita on the expected value of Y.

14

Documents

Dummy Dependent Variables Models