30
4- 4-1 MGMG 522 : Session #4 MGMG 522 : Session #4 Choosing the Independent Choosing the Independent Variables Variables and a Functional Form and a Functional Form (Ch. 6 & 7) (Ch. 6 & 7)

MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form

  • Upload
    gaenor

  • View
    25

  • Download
    0

Embed Size (px)

DESCRIPTION

MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form. (Ch. 6 & 7). Major Specification Problems. Problem with the selection of the independent variables. Problem with the functional form. Problem with the form of the error term. - PowerPoint PPT Presentation

Citation preview

Page 1: MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form

4-4-11

MGMG 522 : Session #4MGMG 522 : Session #4Choosing the Independent VariablesChoosing the Independent Variables

and a Functional Formand a Functional Form

(Ch. 6 & 7)(Ch. 6 & 7)

Page 2: MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form

4-4-22

Major Specification ProblemsMajor Specification Problems

1.1. Problem with the selection of the Problem with the selection of the independent variables.independent variables.

2.2. Problem with the functional form.Problem with the functional form.

3.3. Problem with the form of the error Problem with the form of the error term.term.

Page 3: MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form

4-4-33

Problems with the Selection of the Problems with the Selection of the Independent VariablesIndependent Variables

The choice of independent variables is up to The choice of independent variables is up to the researcher to decide.the researcher to decide.

This freedom does not come without a cost.This freedom does not come without a cost. ProblemsProblems

1. Omitted variables1. Omitted variables 2. Irrelevant variables2. Irrelevant variables

Your underlying theory should give you Your underlying theory should give you some hints about what independent some hints about what independent variables should be included in your variables should be included in your regression model.regression model.

The statistical fit is less important than the The statistical fit is less important than the underlying theory.underlying theory.

Page 4: MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form

4-4-44

Case 1: Omitted VariablesCase 1: Omitted Variables

It occurs when you don’t include important It occurs when you don’t include important independent variables in your regression model independent variables in your regression model when you should, either because you don’t think when you should, either because you don’t think of them or you think of them but you can’t get of them or you think of them but you can’t get the data.the data.

True model: Y = True model: Y = 00++11XX11++22XX22++εε

Your model: Y = Your model: Y = 00++11XX11++εε**

where, where, εε* * = = 22XX22++εε

If XIf X11 and X and X22 are not completely independent, are not completely independent, εε** will not be independent of Xwill not be independent of X11, a violation of the , a violation of the classical assumption #3 (all X’s are uncorrelated classical assumption #3 (all X’s are uncorrelated with with εε).).

Page 5: MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form

4-4-55

Problems:Problems:

1.1. OLS is no longer BLUE.OLS is no longer BLUE.

2.2. Coefficient estimates are Coefficient estimates are biasedbiased, . , .

3.3. , variances of the coefficient , variances of the coefficient estimates decrease. See p. 4-8.estimates decrease. See p. 4-8.

For a 2-independent variable model, it For a 2-independent variable model, it can be shown that,can be shown that,

E(E(11) = ) = 11++2211

where where 11 is from: X is from: X22 = = 00++11XX11+u+u

and u is a classical error termand u is a classical error term Coefficient estimates could be unbiased if Coefficient estimates could be unbiased if

22 = 0 or = 0 or 11 = 0. But, that is unlikely. = 0. But, that is unlikely.

kkE ˆ

kVAR ̂

Page 6: MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form

4-4-66

E(E(11) = ) = 11++2211

The amount of bias = The amount of bias = 2211.. Or, the amount of bias = Or, the amount of bias = 22f(rf(rx1,x2x1,x2).). The direction of the bias can be determined by The direction of the bias can be determined by

the signs of the signs of 22 and and 11. For example, (-)(-)=(+) . For example, (-)(-)=(+) or (-)(+)=(-).or (-)(+)=(-).

To correct for the omitted variables problem,To correct for the omitted variables problem,1.1. Think again about your theory. What other Think again about your theory. What other

important variables could be missing?important variables could be missing?2.2. If the signs of the included coefficient estimates If the signs of the included coefficient estimates

are unexpected, you could probably tell the are unexpected, you could probably tell the direction of the bias and have some clues about direction of the bias and have some clues about the missing variables.the missing variables.

Page 7: MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form

4-4-77

Case 2: Irrelevant VariablesCase 2: Irrelevant Variables

It occurs when you have some It occurs when you have some unnecessary independent variables in your unnecessary independent variables in your regression model.regression model.

True model: Y = True model: Y = 00++11XX11++εε

Your model: Y = Your model: Y = 00++11XX11++22XX22++εε****

where, where, εε** ** = = εε--22XX22

The coefficient estimates will still be The coefficient estimates will still be unbiased, but increase, lowing the unbiased, but increase, lowing the reported t-values.reported t-values.

kVAR ̂

Page 8: MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form

4-4-88

For the model, Y = For the model, Y = 00++11XX11++22XX22..

If rIf r1212 ≠ 0, the variance will increase. ≠ 0, the variance will increase. If rIf r1212 = 0 or the irrelevant variable is not in = 0 or the irrelevant variable is not in

the regression model, the variance will the regression model, the variance will stay the same.stay the same.

Now, it seems like having extra Now, it seems like having extra unnecessary variables is not as serious a unnecessary variables is not as serious a problem compared to the omitted problem compared to the omitted variables problem.variables problem.

In fact, we want neither one of these In fact, we want neither one of these problems.problems.

211212

2

11

ˆ

XXrVAR

Page 9: MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form

4-4-99

Four Criteria to Help You Choose Four Criteria to Help You Choose the Independent Variablesthe Independent Variables

1.1. TheoryTheory2.2. t-Testt-Test3.3. Adj-RAdj-R22

4.4. BiasBias If all four conditions are met, that If all four conditions are met, that

variable should be in your model.variable should be in your model. If not, that variable doesn’t belong in If not, that variable doesn’t belong in

your model.your model. If some conditions are met while some If some conditions are met while some

are not, use your judgment.are not, use your judgment.

Page 10: MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form

4-4-1010

When you add an omitted variable, usuallyWhen you add an omitted variable, usually– Adj-RAdj-R22 will rise will rise– Coefficient estimates will changeCoefficient estimates will change

When you add an irrelevant variable, When you add an irrelevant variable, usuallyusually– Adj-RAdj-R22 will fall will fall– Coefficient estimates will not changeCoefficient estimates will not change– t-values become less significantt-values become less significant

Don’t rely on the Adj-RDon’t rely on the Adj-R22 criterion alone, criterion alone, because it can be shown that Adj-Rbecause it can be shown that Adj-R22 will will rise if you include a variable with t-value > rise if you include a variable with t-value > 1 but not significant.1 but not significant.

Adj-RAdj-R22 will also rise if you delete a variable will also rise if you delete a variable with t-value < 1 from your regression with t-value < 1 from your regression model. See an example on pp. 173-176.model. See an example on pp. 173-176.

Page 11: MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form

4-4-1111

Three Methods to Avoid When Three Methods to Avoid When Choosing Independent VariablesChoosing Independent Variables

1.1. Data MiningData Mining2.2. Stepwise RegressionStepwise Regression3.3. Sequential SearchSequential Search You should specify as few models as You should specify as few models as

possible.possible. The more you look, the higher the The more you look, the higher the

chance you will find a model that has a chance you will find a model that has a good statistical fit with not much good statistical fit with not much theoretical support.theoretical support.

Do not select a variable based on its t-Do not select a variable based on its t-value, because that technique creates a value, because that technique creates a systematic bias. systematic bias. How?How?

Page 12: MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form

4-4-1212

Lagged VariableLagged Variable

Sometimes, the change in Y is not Sometimes, the change in Y is not caused by the change in X from the caused by the change in X from the current period, but from the other current period, but from the other period.period.

The coefficient estimate of a lagged The coefficient estimate of a lagged variable measures the change in Y variable measures the change in Y this period as a result of a change in this period as a result of a change in X in the other period.X in the other period.

Page 13: MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form

4-4-1313

Other Specification CriteriaOther Specification Criteria

Besides the four criteria outlined on Besides the four criteria outlined on p. 4-9, there are other specification p. 4-9, there are other specification criteria.criteria.

1.1. Ramsey’s Regression Specification Ramsey’s Regression Specification Error Test (RESET)Error Test (RESET)

2.2. Akaike’s Information Criterion (AIC)Akaike’s Information Criterion (AIC)

3.3. Schwarz Criterion (SC)Schwarz Criterion (SC)

Page 14: MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form

4-4-1414

Ramsey’s Regression Specification Ramsey’s Regression Specification Error Test (RESET)Error Test (RESET)

HH00: There is no specification error.: There is no specification error. HH11: There is a specification error.: There is a specification error. If F-value from the RESET is higher than the If F-value from the RESET is higher than the

critical F-value, we can reject Hcritical F-value, we can reject H00, meaning that , meaning that there is a specification error. However, RESET there is a specification error. However, RESET doesn’t tell how to correct it.doesn’t tell how to correct it.

If F-value from the RESET is lower than the critical If F-value from the RESET is lower than the critical F-value, we cannot reject HF-value, we cannot reject H00, meaning that we , meaning that we probably have a correct specification.probably have a correct specification.

RESET is more useful in confirming our model RESET is more useful in confirming our model than telling us what’s wrong and how to correct than telling us what’s wrong and how to correct our model.our model.

Page 15: MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form

4-4-1515

AIC and SCAIC and SC

AIC and SC are used to compare two AIC and SC are used to compare two regression models.regression models.

Both AIC and SC penalize the Both AIC and SC penalize the addition of another independent addition of another independent variable if it doesn’t improve the variable if it doesn’t improve the overall fit significantly.overall fit significantly.

Between two regression models, the Between two regression models, the one with lower AIC and SC values is one with lower AIC and SC values is preferred.preferred.

Page 16: MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form

4-4-1616

Choosing a Functional FormChoosing a Functional Form

Now, you have a set of independent Now, you have a set of independent variables, you still need to specify a variables, you still need to specify a functional form.functional form.

That is, how Y is related to each X.That is, how Y is related to each X. About the intercept term:About the intercept term:

1.1. Do not suppress the intercept term even Do not suppress the intercept term even if the theory suggests.if the theory suggests.

2.2. Do not rely on the estimate of the Do not rely on the estimate of the intercept term for analysis or inference.intercept term for analysis or inference.

Page 17: MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form

4-4-1717

Different Functional FormsDifferent Functional Forms

1.1. Linear Form: Y = Linear Form: Y = 00++11XX11++εε

2.2. Double-Log Form: lnY = Double-Log Form: lnY = 00++11lnXlnX11++εε

3.3. Semilog Form: lnY = Semilog Form: lnY = 00++11XX11++εε , or , or

Y = Y = 00++11lnXlnX11++εε

4.4. Polynomial Form:Polynomial Form:

Y = Y = 00++11XX11++22(X(X11))22++33(X(X11))33++εε

5.5. Inverse Form: Y = Inverse Form: Y = 00++11(1/X(1/X11)+)+εε• Theory usually suggests only the signs of the Theory usually suggests only the signs of the

coefficients, not the functional form.coefficients, not the functional form.

Page 18: MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form

4-4-1818

Linear FormLinear Form

Y = Y = 00++11XX11++22XX22+…++…+εε The slope is constant.The slope is constant.

But, the elasticity of Y with respect to X But, the elasticity of Y with respect to X is is notnot constant. constant.

Unless the theory suggests otherwise, Unless the theory suggests otherwise, the linear form should be used.the linear form should be used.

11

X

Y

Y

X

XX

YY 11

11 /

/

Page 19: MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form

4-4-1919

Double-Log FormDouble-Log Form

lnY = lnY = 00++11lnXlnX11++22lnXlnX22+…++…+εε This is another popular form besides the This is another popular form besides the

linear form.linear form. It is also known as the “Log-linear” form.It is also known as the “Log-linear” form. The slope is The slope is notnot constant. constant. But, the elasticity of Y with respect to X is But, the elasticity of Y with respect to X is

constant.constant.

See p. 212 for more information.See p. 212 for more information.

1

111 ln

ln

/

/

X

Y

XX

YY

Page 20: MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form

4-4-2020

Semilog FormSemilog Form

lnY = lnY = 00++11XX11++22lnXlnX22++εε , or , or

Y = Y = 00++11lnXlnX11++22XX22++εε Similar to the Double-Log form, Similar to the Double-Log form,

except that some variables, but not except that some variables, but not all, are in log form.all, are in log form.

See p. 214 for more information.See p. 214 for more information.

Page 21: MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form

4-4-2121

Polynomial FormPolynomial Form

Y = Y = 00++11XX11++22(X(X11))22++33(X(X11))33++εε Appropriate for a model where Appropriate for a model where

changes in X cause Y to changes in X cause Y to increase/decrease over some range increase/decrease over some range and decrease/increase over other and decrease/increase over other range.range.

See p. 217 for more information.See p. 217 for more information.

Page 22: MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form

4-4-2222

Inverse FormInverse Form

Y = Y = 00++11(1/X(1/X11)+)+εε Appropriate for a model where the Appropriate for a model where the

impact of an independent variable impact of an independent variable approaches zero as its value gets approaches zero as its value gets large.large.

See p. 219 for more information.See p. 219 for more information.

Page 23: MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form

4-4-2323

Selecting a Functional FormSelecting a Functional Form Rely on your theory. What does your theory Rely on your theory. What does your theory

tell you about the relationships?tell you about the relationships? Do not compare Adj-RDo not compare Adj-R22 from a linear in from a linear in

variable model with a nonlinear in variable variable model with a nonlinear in variable model. Becausemodel. Because– Adj-RAdj-R22 are not comparable when Y is are not comparable when Y is

transformed. Use for transformed. Use for

comparison instead.comparison instead.– Adj-RAdj-R22 may look good inside the range of the may look good inside the range of the

sample, but could look bad outside the range of sample, but could look bad outside the range of the sample.the sample.

2

2

2 )ˆln(antilog1Quasi

YY

YYR

i

ii

Page 24: MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form

4-4-2424

Dummy VariablesDummy Variables

1.1. Dummy InterceptDummy Intercept takes the form: takes the form:

Y = Y = 00++11XX11++22D+D+εε2.2. Dummy SlopeDummy Slope takes the form: takes the form:

Y = Y = 00++11XX11++22XX11D+D+εε3.3. Both Both Dummy InterceptDummy Intercept and and Dummy Dummy

SlopeSlope take the form: take the form:

Y = Y = 00++11XX11++22XX11D+D+33D+D+εε** We will discuss the concept and use of ** We will discuss the concept and use of

dummy variables again in “Panel Data dummy variables again in “Panel Data Regression” if we have time.Regression” if we have time.

Page 25: MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form

4-4-2525

Appendix: General F-TestAppendix: General F-Test

Any F-test we’ve seen so far can be Any F-test we’ve seen so far can be thought of as a special case of the general thought of as a special case of the general F-test.F-test.

The general F-test tests more than one The general F-test tests more than one coefficient at a time.coefficient at a time.

The null hypothesis for the general F-test The null hypothesis for the general F-test is what we think is correct.is what we think is correct.

We usually want to “accept” HWe usually want to “accept” H00.. This contrasts to the traditional way of This contrasts to the traditional way of

hypothesis testing we’ve learned.hypothesis testing we’ve learned.

Page 26: MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form

4-4-2626

Steps in General F-TestSteps in General F-Test

1.1. Specify the null and alternative Specify the null and alternative hypotheses.hypotheses.

2.2. The null hypothesis will be used as a The null hypothesis will be used as a constraint to be put on the equation.constraint to be put on the equation.

3.3. Calculate RSSs from the constraint and Calculate RSSs from the constraint and the unconstraint equations.the unconstraint equations.

4.4. If the fits of the two equations are not If the fits of the two equations are not significantly different, we will “accept” significantly different, we will “accept” HH00. If the fits are significantly different, . If the fits are significantly different, reject Hreject H00..

Page 27: MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form

4-4-2727

F-statistic:F-statistic:

RSSRSSCC = = Residual Sum of Squares Residual Sum of Squares

from the constraint equationfrom the constraint equation RSSRSSUU = = Residual Sum of Squares from the Residual Sum of Squares from the

unconstraint equationunconstraint equation M = M = # of constraints# of constraints K = K = # of independent variables in the # of independent variables in the

unconstraint equationunconstraint equation n = n = # of observations# of observations

1Kn/RSS

/MRSSRSSF

U

UC

Page 28: MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form

4-4-2828

Example of General F-TestExample of General F-Test Y = Y = 00++11XX11++22XX22++33XX33++44XX44++εε ----- (1) ----- (1) Suppose you think Suppose you think 11==33==44=0=0 In other words, Y = In other words, Y = 00++22XX22++εε. ----- (2). ----- (2) Therefore, your HTherefore, your H0 0 is is 11==33==44=0.=0. And your HAnd your H11: The original model fits the data OK.: The original model fits the data OK. You’ll run OLS of (1) and obtain RSSYou’ll run OLS of (1) and obtain RSSUU.. You’ll run OLS of (2) and obtain RSSYou’ll run OLS of (2) and obtain RSSCC.. Substitute RSSSubstitute RSSUU and RSS and RSSCC from (1) and (2) into the from (1) and (2) into the

F-statistic formula.F-statistic formula. Then, compare your F-value with the critical F-value Then, compare your F-value with the critical F-value

and make the decision whether or not to reject Hand make the decision whether or not to reject H00.. Note for this example, K = 4, M = 3.Note for this example, K = 4, M = 3.

Page 29: MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form

4-4-2929

Chow TestChow Test Chow test is a test whether two data sets Chow test is a test whether two data sets

can be combined into one data set can be combined into one data set because the slopes are not statistically because the slopes are not statistically different.different.

Put differently, there is no structural Put differently, there is no structural change in the model between the two data change in the model between the two data sets (e.g., before and after a war.)sets (e.g., before and after a war.)

HH00: Slopes in the two data sets are not : Slopes in the two data sets are not different (no structural change).different (no structural change).

HH11: Slopes in the two data sets are : Slopes in the two data sets are different (there is structural change).different (there is structural change).

Page 30: MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form

4-4-3030

Steps in Chow TestSteps in Chow Test

1.1. Run two separate OLS regressions with the Run two separate OLS regressions with the same specification for each data set and record same specification for each data set and record RSS from each data set. Call these, RSSRSS from each data set. Call these, RSS11 and and RSSRSS22..

2.2. Combined the two data sets into one and run Combined the two data sets into one and run OLS with the same specification again, and OLS with the same specification again, and record RSS. Call it, RSSrecord RSS. Call it, RSSTT..

K = # of independent variablesK = # of independent variables

NN11 = # of observations in sample 1 = # of observations in sample 1

NN22 = # of observations in sample 2 = # of observations in sample 2

3.3. Reject HReject H00 if F-value > critical F-value. if F-value > critical F-value.

22KNN/RSSRSS

1K/RSSRSSRSSF

2121

21T