Cross Section Answers

  • Upload
    damla87

  • View
    225

  • Download
    0

Embed Size (px)

Citation preview

  • 8/8/2019 Cross Section Answers

    1/22

    Economic Studies 20082009

    60052 Cross Section Econometrics

    This document provides answers to the May Examinations for 2007 and later. Answersfor ES5052 May 2006 and before are in separate document.

    Martyn Andrews and Mette Christensen, xx/05/2009.

    1

  • 8/8/2019 Cross Section Answers

    2/22

    Answers to May 2009 Examination

    1. (a) Compute differences and 2 more columns:

    t i y x1 x2 x3 yx1 (x1)2

    1 1 . . . . . .

    2 1 -6 1 0 2 -6 11 2 . . . . . .2 2 4 1 0 2 4 11 3 . . . . . .2 3 0 0 0 2 0 01 4 . . . . . .2 4 -2 -1 0 2 2 11 5 . . . . . .2 5 2 0 0 2 0 0

    0 3

    The FD estimator is given by:

    = yx1(x1)2

    =0

    3= 0.

    [4]. Dropping i = 3 makes no difference to either numerator or denominator,

    ie = 0 [2]. [6 marks](b) x3 = 2 for i = 1, . . . , 5 and d2t = 1 for i = 1, . . . , 5.[2] Adding them both

    wont work because they are collinear with each other.[1] (Each on its own willwork because there is no constant.) [3 marks]

    (c) Construct the following (see Exercise C):y x1 yx1 x

    2

    1

    y1 0 0 0y2 0 0 0 stayers

    ......

    ......

    0 0 01 yi 11 yi 1 nq quitters

    ......

    ......

    1 yi 11 yi 11 yi 1 nj joiners

    ......

    ......

    yn 1 yi 1

    Hence we can write:

    = i yix1ii x

    2

    1i

    =

    ij yi

    iq yi + 0

    nq + nj + 0

    2

  • 8/8/2019 Cross Section Answers

    3/22

    By ignoring data for whom x1i = 0, we can see that top block contributes

    nothing to . [7 marks](d) Now construct mean-deviations:

    t i y x1 x2 x3 y x1 x2 yx11 1 10 0 1 24 3 -1/2 0 -3/22 1 4 1 1 26 -3 1/2 0 -3/21 2 3 0 1 34 -2 -1/2 0 12 2 7 1 1 36 2 1/2 0 11 3 1 0 0 51 0 0 0 02 3 1 0 0 53 0 0 0 01 4 5 1 0 44 1 1/2 0 1/22 4 3 0 0 46 -1 -1/2 0 1/21 5 2 1 0 17 -1 0 0 02 5 4 1 0 19 1 o 0 0

    The FE estimator is given by:

    = yx1(x1)2

    =0

    3/2= 0.

    [5]. This demonstrates that, for T = 2, FD and FE are identical.[2] Onecannot add x2 to this regression because x2 = 0 for everyone. Its a maledummy, possibly.[2] [9 marks]

    2. (a) Just state that the Robust Covariance Matrix can be written as:

    Avar() = (XX)1n

    i=1u2i xixi (XX)1Also, because

    ni=1

    u2i xixi = x1 x2 . . . xn u2

    10 . . . 0

    0 u22

    .... . .

    0 u2n

    x1x2...

    xn

    = XX,this can be written:

    Avar() = (XX)1XX(XX)1.Note that X is k n, = diag{u2i } is n n, and X is n k. [4 for eitherformula]

    (b) Write the model asyi = 0 + 1xi + ui,

    and then write in mean deviation form

    yi y = 1(xi x) + ui.

    3

  • 8/8/2019 Cross Section Answers

    4/22

    Now X is a 1 n matrix:X = [x1 x, x2 x , . . . , xn x]

    which means that

    XX =n

    i=1 (xi x)2 = SSTxand

    (XX)1 = 1/n

    i=1

    (xi x)2 = 1/SSTx

    Next,

    XX = [x1 x, x2 x , . . . , xn x]

    u21

    0 . . . 00 u2

    2

    .... . .

    0 u2n

    x1 xx2 x

    ...xn

    x

    =n

    i=1

    (xix)2

    u2i .

    Finally,

    (XX)1(XX)(XX)1 =1

    SSTx

    ni=1

    (xi x)2u2i 1SSTx .QED. [8 marks]

    (c) Write the model as:

    y = 1 + 2d2 + + 5d5 + u,

    where the dk

    represent the age dummies. The traditional standard errors areapprox 0.022 for all 4 dummies, whereas the robust equivalents are 0.018,which is 20% lower. This suggests that there is some heteroskedasticity.[4]

    A formal test comes from assuming that the homoskedasticity is linear:

    Var(u|d1, . . . , d5) = E(u2|d1, . . . , d5) = 2.To implement the test, run the following regression:

    u2 = 1 + 2d2 + + 5d5 + v,and test H0 : 2 = = 5 = 0 (4 restns) using an F-test. The observedF statistic is 15.9 (pv=0.0000), and so there is considerable evidence of het-eroskedasticity.[4]

    The regression also shows that there is considerable variation in the samplevariation ofy across the 5 age categories, going from 0.17, 0.24, 0.32, 0.35, and0.34.[2]

    The point here is that, because there is heteroskedasticity, cannot use the F-test. One has to use the Robust Covariance Matrix and test the restrictions

    jointly (A Wald testtest in Stata).[3] [I gave one mark if students note thatall the estimates are significant in the Robust regression.]

    4

  • 8/8/2019 Cross Section Answers

    5/22

    3. (a) The IV estimator is 1,IV = (zi z)(yi y)(zi z)(xi x) (1)

    [1] and its estimated standard error is

    se(1,IV ) = SSTxR2x,z (2)[1], where R2x,z is from regressing x on 1 and z [1], is the estimated standarderror of u [1], and SSTx is the total sum-of-squares of x [1]. [5 marks]

    (b) Replace z by x is expression for 1,IV :1,OLS = (xi z)(yi y)

    (xi x)2 . (3)

    [1] If x acts as an IV for z, they are perfectly collinear, and so R2x,z = 1 [1].

    Substituting, se(1,OLS) = SSTx

    (4)

    [1] [3 marks]

    (c) The ratio of estimated standard errors is:

    se(1,IV)se(1,OLS) = 1R2x,z[2]. As R2x,z 1, IV standard errors always bigger [1]. (When R2x,z = 1, itsOLS.) [3 marks]

    (d) The OLS estimate of the return to education is 0.101 (0.0066), which is stan-dard [1]. However, because u contains ability, which causes E(u|x) > 0, OLSis biased upwards [1].

    Both instruments are family background variables. For either be a valid IV,

    i. E(u|z) = 0 [1], ie family background should be uncorrelated with ability.This is the standard assumption [1].

    ii. z does not belong in the model being estimated [1]. Employers do not giveextra pay because an individual has more/fewer siblings, or because theycome from a broken home [1].

    iii. E(x|z) = 0 [1]. In the RF regression of educ on the 2 IVs, they aresignificant (t-stats 5.1 and 7.6), but the R2 is worryingly low [1].For the broken home variable, the IV returns to education is 0.221 (0.0474)[1]. This moves the wrong way [1], and the IV se is 8 times bigger! [1]. For noof siblings, the IV estimate doesnt move, ie is 0.110 (0.0300), and the IV seis 5 times bigger [1].

    The effect of the 2 IVs on the IV returns to education differ [1], but one cannotchose between them because both are weak and do not move IV estimate inthe right direction [1].

    5

  • 8/8/2019 Cross Section Answers

    6/22

    (At this point one can get about 14 marks. 2 extra are awarded if one notesthat the model could be estimated using both IVs at the same time becausethe model is then over-identified [1]. One would use GIVE/2SLS [1].)

    4. Binary choice model compared with linear probability model.

    (a) Men are more likely to have an affair than women, since the estimated coef-ficient on male is positive. This can be seen from both the linear probabilitymodel and from the logit model. The effect is not significant, though, sincep-values are larger than 5% (0.081). The difference in probability of havingan affair between men and women according to the linear probability modelis OLSmale = 0.0626079, i.e. men are 6.26% more likely to have an affair thanwomen. According to the logit model:

    P(y = 1 | male = 1, x) = F(0 + 1yrsmarr + 2relig + 3 + 4age)= F(.0198 + .1282(8.1777) + (.3464)(3.1165) + .3521 + (.039967)(32.4875))

    = exp(1.3494)1 + exp(1.3494) = .269473

    P(y = 1 | male = 0, x) = F(0 + 1yrsmarr + 2relig + 4age)=

    exp(.9973)1 + exp(.9973) = .205969,

    i.e. men are 6.35% more lilely than women to have an affair.

    (b) The effect of years of being married on the probability of having an affair for25 year old males:

    P(y = 1 | male = 1, x,age = 25)yrsmarr

    = 1f(0 + 1yrsmarr + 2relig + 3 + 254)

    = (.1282)exp(.6980)

    (1 + exp(.6980))2 = .0284

    And for 40 year old females:

    P(y = 1 | male = 0, x,age = 40)yrsmarr

    = 1f(0 + 1yrsmarr + 2relig + 404)

    = (.1282)exp(1.64956)

    (1 + exp(1.64956))2 = .0173

    (c) The effect of years of being married on the probability of having an affair for

    6

  • 8/8/2019 Cross Section Answers

    7/22

    a very religious person (coded 5):

    P(y = 1 | relig = 5, x)yrsmarr

    = 1f(0 + 1yrsmarr + 25 + 3male + 4age)

    = .1282f(.0197 + .1282(8.177) + (.346)5 + .3521(.4759)

    + (.03997)32.488)= .1282

    exp(. 1.8331)(exp(1.8331))2

    = .0152

    The effect of years of being married on the probability of having an affair foran anti-religious person (coded 1):

    P(y = 1 | relig = 1, x)yrsmarr

    = 1f(0 + 1yrsmarr + 2 + 3male + 4age)

    = .1282exp(.2104)

    (exp(.2104))2 = .0305

    (d) All four effects are according to the LPM relig = .02268.

    (e) In the LPM, the marginal effects are constant, whereas in the logit modelsthey can vary with other observables. This is where the logit model becomesricher than the LPM. From our logit estimates we can see that the effect ofyrsmarr on the probability of having an affair is very different for differentage, gender and religiousity groups (.02 or .017 or .0152 or .0305, dependingon age, gender and relig). These effects would all be .022 in the LPM.

    5. (a) The appropriate sample selection model:

    yi = x

    1i1 + 1 (structural equation)

    hi = x

    2i2 + 2 (selection equation)

    yi = y

    i , h

    i > 0

    yi not observed if h

    i 012

    N

    00

    ,

    211

    121

    (b) The OLS conditional mean is E(y

    |x) = x

    1. The conditional mean, conditional

    on working is

    E(y|x1; h = 1) = ... bookwork..... = x11 + 12(x

    22)

    (x22)

    .

    (c) No. The model in the output has x1 = x2, which means that 12 is onlyidentified off that () is non-linear. But from the regression output in theappendix, we can see that a regression of the estimated s on x1 = x2 hasan R2 = .9696, i.e. () is almost linear. This means that identification is

    7

  • 8/8/2019 Cross Section Answers

    8/22

    very weak. A preferred model would have some identifying extra variables inthe selection equation which do not affect wages. One such variable could bekidlt6. Another possibly age.

    6. (a) The kernel estimator:

    f(x) = 1nh

    ni=1

    kxi xh

    ,where h = bandwidth, k() is the kernel function.

    (b) Bookwork. Do not expect to derive the bias and variance, but must knowthe trade-off between bias and variance as a function of the bandwidth, andpossibly also provide the expressions for the bias and variance.

    (c)

    f(x) = 106

    0, x < .6.5, .6 < x < .9

    1, .9 < x < 11, 1 < x < 1.3.5, 1.3 < x < 1.40, x > 1.4

    8

  • 8/8/2019 Cross Section Answers

    9/22

    Answers to May 2008 Examination

    1. [Comment: I reallocated the marks after I realised that nobody was answering part(f) as I expected. (f) is 2 less; (e) 2 more.]

    (a) When either = 1 (trivial) or when xit and dit are (exactly) uncorrelated in thesample.[3 marks] [Comment: this extremely basic idea has been emphasisedrepeatedly in the course.] Basic OLS algebra applied to (2) gives = y

    1 y

    2,

    where 1 denotes migrants and 2 denotes non-migrants.[3 marks] [Thisalgebra comes from straight from the lecture notes. Some tried applying OLSto (1) or the first (2).] An alternative answer that ignores the fact that xi2 isa dummy is [3]:

    =

    i(yi y)(xi2 x2)

    i(xi2 x2)2.

    [6 marks]

    (b) Denote the estimators as Pooled OLS and FD respectively. For FD:

    E(xiui) = E[(xi2 xi1)(ui2 ui1)] = 0.4 sufficient conditions are:

    E(xi1ui1) = 0

    E(xi2ui2) = 0

    E(xi1ui2) = 0

    E(xi2

    ui1

    ) = 0.Pooled OLS doesnt need the last two, and is therefore the weaker assumption.[This algebra comes from straight from the lecture notes on FD.]

    [3 marks]

    (c) The first Stata regression is (1) and the second is (2). Iow, the same modelusing the same data is being estimated by OLS and FD resp. Because thesecond model removes 3275 fixed effects ai, FD has 3275 fewer obs.[2 marks]

    The F-statistic is

    1551.63 195.23195.23 3275 23274 = 6.946,and so reject H0 : ai = a.[2 marks]

    [4 marks]

    (d) When the ai are not controlled for (Pooled OLS), = 0.200 and is almostsignificant; when the ai are controlled for (FD), = 0.004 and is insignificant.This is heterogeneity bias, because ai and xit are strongly correlated.[3 marks.I gave 2 marks if they just said that the ai are jointly significant.]

    9

  • 8/8/2019 Cross Section Answers

    10/22

    (e) For a particular variable to be a valid IV, one needs to examine three condi-tions. First, should nokid actually belong in the wage equation? Probably not;employers dont give extra money for having kids.[2] Second, is nokid likely tobe correlated with motivation? Again, probably not.[2] Thus far, nokid lookslike a valid IV.

    Third, are nokid and move correlated? In a regression of move on nokid, thereis no evidence that having kids makes one more or less likely to migrate (t-stat=0.78) (even though a priori one might expect kids to act as a constrainton migration). Also, the R-squared is 0.0035.[1] This is a weak instrument[1], and it means that the standard error on move in the IV regression is 86.9compared with the OLS standard error of 0.104! [1]

    [7 marks]

    (f) Here the investigator is dealing with ai by differencing, but still believes thatE(xiui) = 0. Like in (e), nokid is a weak instrument for move.[1] [1]more mark is available for mentioning any of the following.

    If motivation can be plausibly modelled a time-invariant effect, then its veryimportant.

    To deal with either E(xituit) = 0 or E(xiui) = 0, we learn nothing usingnokid as an IV.

    There are only 22 migrants!

    [2 marks]

    [General comment: this question was badly answered, although all the material isvery familiar. Maybe by examining two major parts of the course, FD and IV, inthe same question, this confuses students.]

    2. (a) The Traditional Covariance Matrix is given by [3 marks]:

    2(XX)1 = 2

    n1 00 n2

    1

    = 2

    1/n1 00 1/n2

    The Robust Covariance Matrix is given by [3 marks]:

    XX = 1, 1, . . . , 1, 0, . . . , 00, 0, . . . , 0, 1, . . . , 1

    u2

    10 . . . 0

    0

    u22

    ..

    .

    . . .0 u2n

    1 01 0...

    ...1 0

    0 1......

    0 1

    = i1

    u2i 0

    0 i2u2i .

    Hence [3 marks]:

    (XX)1XX(XX)1 =1/n1 0

    0 1/n2

    i1u2i 0

    0

    i2u2i

    1/n1 00 1/n2

    2

    1/n1 00 2

    2/n2

    .

    10

  • 8/8/2019 Cross Section Answers

    11/22

    With homoskedasticity, 21

    = 22

    = 2, which delivers the Traditional Covari-ance matrix above.[1]

    [Comment: this question was taken straight from Exercise B. It appears thatstudents just dont engage with matrices anymore - maybe a downside ofWooldridges book?]

    [10 marks]

    (b) There are two bits of evidence. First, to estimate Var(u|x1, . . . , xk), take thesquared residuals from the model and regress them on covariates. In thisparticular case, the investigator is estimating:

    Var(u|male) = 0 + 1male.The estimate on male has a p-value of 0.123, and so one does not reject H0 :Var(u|male) = Var(u|female).[3] Also, in the raw data, one can see thatse(logwahehr) is very similar for males and females (0.459 and 0.462) resp.[1]

    Second, the Robust and Traditional standard errors are quite similar:

    male 0.0160 0.0160 (0%)age 0.00520 0.00474 (9.7%)agesq 0.0000626 0.0000583 (7.4%)

    Iow, the heteroskedasticity is quite weak.[3]

    [7 marks]

    (c) i. False. The LPM always exhibits heteroskedasticity, but heteroskedasticitydoes not affect unbiasedness or consistency in linear models.[3]

    ii. True.[2]

    iii. False. All inference is affected by heteroskedasticity (including Chow

    tests).[3][8 marks]

    3. [Most of this is taken verbatim from the slides.]

    (a) We observe, for each individual, the observed outcome Yi, not both Yi(0) andYi(1):

    Yi = WiYi(1) + (1 Wi)Yi(0) (5)[3 marks]

    (b) The population average treatment effect (PATE) and the populationaverage treatment effect for the treated (PATT) are resp:

    P = E[Yi(1) Yi(0)] P,T = E[Yi(1) Yi(0)|Wi = 1]The sample average treatment effect (SATE) and the sample averagetreatment effect for the treated (SATT) are resp:

    S =1

    N

    Ni=1

    [Yi(1) Yi(0)] S,T = 1NT

    Ni:Wi=1

    [Yi(1) Yi(0)],

    where NT =N

    i=1 Wi is the number of treated individuals.

    11

  • 8/8/2019 Cross Section Answers

    12/22

  • 8/8/2019 Cross Section Answers

    13/22

    [6 marks]

    4. (a) The three selection models are constructed such that the first model is the bestone (it satisifies that the explanatory variables in the wage equation is a strictsubset of the explanatory variables in the selection equation, thus it identifiesthe selection off these extra variables and not just of the non-linearity of theinverse mills ratio), the second also includes nwifeinc and kidlt6 in the wageequation (hence violating that preferably, the selection equation should containvariables that are not in the wage equation) and the third model does notinclude the variables from the wage equation in the selection equation (henceviolating that the set of explanatory variables in the wage equation should bea subset of the set of explanatory variables in the selection equation).

    (b) Since the t test on the inverse mills ratio comes out insiginificant (t=1.76),we can not reject the null hypothesis of no sample selection bias. Hence, thepotential problem of bias from simple OLS estimation is not a problem afterall.

    (c) Comparing the estimated return to education from the OLS with the estimatedreturn to education from the selection model: OLS = 0.09915, Heckman =0.1032796. Hence, almost no difference at all, as to be expected when the testfor sample selection bias comes out with the result that there is no bias.

    (d) The potential problem with simply running an OLS of log wages on educationand other explanatory variables is that a substantial part of the women inthe data set (41.68%, nearly half!) do not work, hence we do not observetheir wages. This can potentially cause the OLS estimates to be biased. Theconditional mean estimated by OLS is

    E(w|x) = x

    11,

    whereas the conditional mean when taking the selection properly into accountis

    E(w|x) = x11 + 12(x

    22),

    where is the inverse Mills ratio. This shows that if 12 = 0, OLS will bebiased on the selected sample.

    5. (a) The logit model models the probability of the baby being underweight, P(dw =1|), so since the coefficient on cigs is positive and cigs is significant (t=2.22 ort=2.02), smoking during pregnancy does significantly increase the probability

    of the baby being underweight .(b) The difference in the probabilities of the baby being underweight baby between

    not smoking and smoking a pack of cigarettes per day (using the sample means

    13

  • 8/8/2019 Cross Section Answers

    14/22

    of the other variables):

    Pr(dw = 1| 20 cigs, ) Pr(dw = 1| 0 cigs, ) =F(0 + 1 20 + 2drink + 3mage + 4meduc + 5npvis + 6wh)

    F(0 + 2drink + 3mage + 4meduc + 5npvis + 6wh) =

    F(.3316 + .0281 20 + .115 .0816 .004 29.5579.0234 13.7181 .030411.1889 .2618 .8772)

    F(.3316 + .115 .0816 .004 29.5579.0234 13.7181 .030411.1889 .2618 .8772) =

    F(.7765) F(1.3385) =e.7765

    1 + e.7765 e

    1.3385

    1 + e1.3385=

    .3151 .2078 = .1073

    Between smoking 10 cigarettes and 20 cigarettes per day (using the samplemeans of the other variables):

    Pr(dw = 1| 20 cigs, ) Pr(dw = 1| 10 cigs, ) =F(.7765) F(1.3385 + .0281 10) =

    F(.7765) F(1.0575) =e.7765

    1 + e.7765 e

    1.0575

    1 + e1.0575=

    .3151 .2578 = .0573

    (c) The marginal effect on the probability of having an underweight baby of thenumber of prenatal visits when the mother did not smoke during pregnancy(using the sample means of the other variables):

    P(dw = 1| cigs = 0, x)(npvisit)

    =e1.3385

    (1 + e1.3385)2(.0304) = .1646(.0304) = .005

    And when the mother smoked a pack of cigarettes per day during pregnancy(using the sample means of the other variables):

    P(dw = 1| cigs = 20, x)(npvisit)

    =e.7765

    (1 + e.7765)2

    (

    .0304) = .2158(

    .0304) =

    .0066

    [? MARKS]

    (d) The same effects as in b. and c., but now from the linear regression model:

    Pr(dw = 1| cigs = 20, ) Pr(dw = 1| cigs = 0, ) = .0054 20 = .1080Pr(dw = 1| cigs = 20, ) Pr(dw = 1| cigs = 10, ) = .0054 10 = .054

    P(dw = 1| cigs = 0, x)(nvisit)

    =P(dw = 1| cigs = 20, x)

    (nvisit)= .0049

    14

  • 8/8/2019 Cross Section Answers

    15/22

    Even though the differences between the logit and linear probability modelsare not great when calculating the differences in the probabilities of having anunderweight baby between not smoking and smoking a pack per day are verysmall, there seems to be rather a big difference in the marginal effects of theprenatal visits between non-smokers and smokers smoking a pack per day inthe logit model which are not captured by the linear probability model. Thus,if the mother does not smoke, an extra prenatal visit lowers the probabilityof her baby being underweight by 0.5% (.49%) according to both the logitand LP models, whereas if the mother smokes a pack per day, the logit modelshows a larger effect of an extra prenatal visit (the probability of having anunderweight baby decreases by 0.66%) than the LPM (the probability of havingan underweight baby decreases by 0.49%). Thus, the logit model capturesthat perhaps the prenatal visits have a larger effect (are more important) forsmokers than for non-smokers. This effect is not captures by the LPM.

    6. Lecture notes on non parametric kernel estimation: Write down the estimator. Ex-plain intuitively what it does (done in lectures by using the uniform kernel). Derive(or state) the bias and variance of the estimator and show that the bandwidth-choiceintroduces a trade-off: Small bandwidth gives small bias, but large variance. Thechoice of bandwidth is thus not trivial.

    15

  • 8/8/2019 Cross Section Answers

    16/22

  • 8/8/2019 Cross Section Answers

    17/22

    andyit yi = (xit xi) + (vit vi).

    When = 0, = 1, the RE estimator becomes OLS. Why?

    g =Wxy + Bxy

    Wxx + Bxx=

    Txy

    Txx

    andyit = 0 + xit + vit.

    [3 marks each]. Extra marks were given for noting that the output gives theOLS/Between estimate of 0.3052 computed in part (b). Why?

    g = BxyBxx

    =BxyBxx

    Also, an estimate of for the RE model can be computed:

    = 0.348252/(0.348252 + 2.0 0.373302) = 0.303,with = 1 = 0.449. This doesnt help much.

    [8 marks]

    17

  • 8/8/2019 Cross Section Answers

    18/22

    2. (a) In order for a variable, z, to serve as a valid instrument for y2, the followingmust be true

    i. The instrument must be exogenous:

    Cov(z, u) = 0 (15.4)

    ii. The instrument must be correlated with the endogenous variable y2:

    Cov(z, y2) = 0 (15.5)iii. z does not belong in (15.2), the model being estimated

    y2 = 0 + 1z + v.

    We sometimes refer to this regression as the first-stage regression.

    Consider (15.2). Multiply by 1 and z, and take expectations:

    1 : E(y1) = 0 + 1E(y2) as E(u) = 0z : E(y1z) = 0E(z) + 1E(y2z) + E(uz)

    Now subtract first times E(z) from second:

    E(y1z) E(z)E(y1) = 1[E(y2z) E(z)E(y2)] + E(uz)or Cov(z, y1) = 1Cov(z, y2) + Cov(z, u)

    Given our assumptions, then

    1 =Cov(z, y1)

    Cov(z, y2). (15.9)

    Thus the IV estimator for 1 is

    1 = (zi z)(y1i y1)(zi z)(y2i y2) . (15.10)

    [7 marks]

    (b) The correlate command suggests both IVs are weakly correlated with educa-tion. However, one should regress education on both IVs. Now husbands agein not significant, and regional unemployment is. [1 mark for each var.]

    Need to discuss whether either variable belongs in the wage equation. Theanswer is no and no, but some noted that in the Wage Curve literatureregional unemployment is in the wage equation. [1 mark for each var.]

    Need to discuss whether either variable is correlated with unobserved abil-ity/motivation. Again, the answer is no and no, but one might argue thatsome clever women marry older men! [1 mark for each var.]

    [6 marks]

    18

  • 8/8/2019 Cross Section Answers

    19/22

    (c) The OLS estimate of 0.108 is absolutely standard (an increase of one yearsschooling increases wages by 11%.)[1 mark]

    But it is probably biased upwards (too big) [1], because unobserved ability ispositively correlated with education [1].

    If the two IVs satisfy the three conditions (part (a)), then the IV estimate

    is consistent, and lower (0.033), as expected. However, the standard erroron education is 7 times bigger [1]. The 95% confidence interval (-0.17,0.23)contains all reasonable values so we learn nothing from this regression [1]. Thisis because the IV is weak [1].

    Bonus if students mention that one might prefer IV if

    Corr(z, u)

    Corr(z, x)< Corr(x, u)

    As the IV gets weaker, Corr(z, x) 0. Might prefer OLS when Corr(z, u) =0.[1]

    (d) Taken straight from the lecture notes:

    If we do not have endogeneity, both OLS and IV are consistent However, OLS is preferred to IV if we do not have an endogeneity problem,

    because the standard errors will be smaller.

    A test for endogeneity is rather simple. Idea of Hausman test is to see ifthe estimates from OLS and IV are different

    Considery1 = 0 + 1y2 + 2z1 + 3z2 + u1, (15.49)

    with reduced-form

    y2 = 0 + 1z1 + 2z2 + 3z3 + 4z4 + v2 (15.50)

    Ify2 is endogenous E(y2u1), then v2 (from the reduced form equation) andu1 from the structural model will be correlated:

    E(y2u1) = 0E(u1) + 1E(z1u1) + + 4E(z4u1) + E(v2u1)= E(v2u1)

    Save the residuals from the first stage and include in the structural equa-tion (which of course has y2 in it):

    y1 = 0 + 1y2 + 2z1 + 3z2 + 1

    v2 + u1, (15.51)

    If the coefficient on the residual is statistically different from zero, rejectthe null of exogeneity. Iow, H0 : 1 = 0 tests E(y2u1) = E(v2u1) = 0.

    The regressions match these notes exactly, ie y1 is lwage, y2 is educ, z1 is experand z2 is expersq. The IV estimate is obtained twice (0.082) and is very similarto OLS.

    The crucial equation is (15.51) and students need to explain why adding forv2tests for endogeneity and that it also generates IV estimates. Then they needto state that v2 is insignificant and so endogeneity is rejected, which is whyOLS and IV are similar. However, this conclusion is again based on weak IVs.[6 marks]

    19

  • 8/8/2019 Cross Section Answers

    20/22

    3. Taken straight from the lecture notes. Policy analysis is discussed in both PooledCross Sections and Panel Data parts.

    4. (a) An expression for the log likelihood function of the logit model:

    L() = i (F(x

    i))yi (1

    F(xi))

    1yi

    l() =

    iyi log(F(x

    i)) +

    i(1 yi)log(1 F(xi))=

    yi=0log F(xi) +

    yi=1

    log(1 F(xi)),so for the logit:

    l() =

    yi=0log

    1

    1 + ex

    i

    +

    yi=1log

    ex

    i

    1 + ex

    i

    =

    yi=0

    log(1 + ex

    i) +

    yi=1

    (xi)

    yi=1

    log(1 + ex

    i)

    = yi=1(xi) ni=1 log(1 + exi).(b) Interpreting the coefficient estimates from the logit and probit: The effect of

    age is negative, ie the probability of being unemployed descreases with age.Males are more likely to be unemployed than females. Having a universitydegree lowers the probability of being unemployed.

    (c) The probability of being unemployed when male:

    pmale = P(y = 1 | x1i = 1, x2i = age, x3i = educ)= F(0 + 1 + 2x2 + 3x3)

    = F(

    1.5741 + .2149 + (

    .0362)(38.8491) + (

    .9473)(.1552))

    = F(2.9125)

    =e2.9125

    1 + e2.9125

    = .0515.

    The probability of being unemployed when female:

    pfemale = P(y = 1 | x1i = 0, x2i = age, x3i = educ)= F(0 + 2x2 + 3x3)

    = F(2.9125 .2149)= F(3.1274)

    =e3.1274

    1 + e3.1274

    = .0420.

    The difference between genders is thus .0515 .0420 = .010, ie women are 1%less likely to be unemployed than men.

    20

  • 8/8/2019 Cross Section Answers

    21/22

    Th effect of age on the probability of being unemployed:

    P(y = 1 | x)(age)

    = 2f(0 + 1male + 2age + 3educ)

    = 2f(1.5741 + .2149(.5216) + (.0362)(38.8491) + (.9473)(.1552))= 2f(3.0145)= 2

    e3.0145

    (1 + e3.0145)2

    = (.0362)(.0446)= .00161,

    i.e. one additional year of age decreases the probability of bein gunemployedby 0.16%.

    The difference between having a university degree or not for males:

    P(y = 1 | x1 = 1, x2 = age, x3 = 1) P(y = 1 | x1 = 1, x2 = age, x3 = 0)= F(0 + 1 + 2age + 3) F(0 + 1 + 2age)= F(3.7120) F(2.7647)

    =e3.7120

    1 + e3.7120 e

    2.7647

    1 + e2.7647

    = .0035,similarly for women we find: P(y = 1 | x1 = 0, x2 = age, x3 = 1) P(y =1 | x1 = 0, x2 = age, x3 = 0) = .0291, i.e. the effect is larger for men.

    5. (a) The income elasticity e for tobacco resulting from OLS:w = 0 + 1 ln x + 2(children) + 3(adults) + 4(age) + u,

    sow

    ln x= 1,

    i.e.

    e =1

    w

    w

    ln x+ 1

    = 1

    w+ 1.

    When calculating this, use the average budget share:

    e =.0142.0322

    + 1 = .5591.

    (b) From the output it can be seen that there is a large proportion of zeroes (62%),hence it is likely that OLS estimates are biased.

    21

  • 8/8/2019 Cross Section Answers

    22/22

    (c) Tobit model would be appropriate because of the large number of zeroes andbecause when the budget share is actually positive, we can regard it is acontinuous variable. The apporpriate tobit model:

    yi = x

    i+ i, i N(0, 2)

    yi = yi , yi 00, yi < 0

    (d) The income elasticity eTOBIT resulting from the tobit model: eTOBIT =.2358.This is smaller than the elasticity resulting from OLS, suggesting that in facttobacco is not that income-sensitive. The difference highlights the importanceof using the the tobit model instead of OLS; OLS overestimates the income-elasticity.

    6. Estimating a linear regresion of wages on education levels, years of labour marketexperience and ages by OLS may not be appropriate because the sample of workers

    may not be a random one: People select themselves into employment and it is notrandomwho choses to take a job/who can get a job, or not. Hence OLS may bebiased. The appropriate sample selection model:

    hi = x

    2i2 + 2i (selection equation)

    wi = x

    1i1 + 1i (wage equation)

    wi = w

    i , hi = 1 if h

    i 0wi not observed and hi = 0 if h

    i 0One can show that the conditional mean E(w|h = 1) = x

    1+ 12(x

    2) (average

    wage, conditional on being in the labour force), which is different from the OLSmean which would be E(w) = x1. Explanation for how Heckmans Two Step

    Estimator works: Particularly, emphasize that if one can find no variables thatenter the selection equation but not the wage equation, identification will happenoff functional form only, which is not as desirable as having real identification.See also slides on Heckman Two-Step.

    22