Econ5025 Practice Problems

Embed Size (px)

Citation preview

1Econ 5025 Applied Econometrics York University Department of Economics Professor Xianghong Li Practice Problems Appendix A (review) 1.Supposethefollowingequationdescribestherelationshipbetweentheaverage number of classes missed during a semester (missed) and the distance from school (distance, measured in miles): 3 0.2 missed distance = + a.Sketchthisline,beingsuretolabeltheaxes.Howdoyouinterpretthe intercept in this equation? b.What is the average number of classes missed for someone who lives five miles away? c.Whatisthedifferenceintheaveragenumberofclassesmissedfor someone who lives 10 miles away and someone who lives 20 miles away? 2.InexampleA.2,quantityofcompactdiscswasrelatedtopriceandincome by income price quantity 03 . 8 . 9 120 + = . What is the demand for CDs if price = 15andincome=200?Whatdoesthissuggestaboutusinglinearfunctionsto describe demand curves? 3.Suppose the unemployment rate in the United States goes from 6.4% in one year to 5.6% in the next. a.What is the percentage point decrease in the unemployment rate? b.By what percentage has the unemployment rate fallen? 4.Supposethatthereturnfromholdingaparticularfirmsstockgoesfrom15%in one year to 18% in the following year. The majority shareholder claims that the stockreturnonlyincreasedbe3%,whilethechiefexecutiveofficerclaimsthat thereturnonthefirmsstockhasincreasedby20%.Reconciletheir disagreement. 5.Suppose that Person A earns $35,000 per year and Person B earns $42,000. a.Find the exact percentage by which Person Bs salary exceeds Person As. b.Now, use the difference in natural logs to find the approximate percentage difference. 6.Supposethefollowingmodeldescribestherelationshipbetweenannualsalary (salary) and the number of previous years of labour market experience (exper): 2 log( ) 10.6 0.027 salary exper = + a.What is salary when exper = 0? When exper = 5? (Hint: You will need to exponentiate.) b.Useequation(A.28)toapproximatethepercentageincreaseinsalary when exper increases by five years. c.Use the results of (a) to computer the exact percentage difference in salary when exper = 5 versus exper = 0. Comment on how this compares with the approximation in (b). 7.Let grthemp denote the proportionate growth in employment, at the county level, from1990to1995,andletsalestaxdenotethecountysalestaxrate,statedasa proportion. Interpret the intercept and slope in the equation salestax grthemp 78 . 043 . = 8.Supposetheyieldofacertaincrop(inbushelsperacre)isrelatedtofertilizer amount (in pounds per acre) as fertilizer yield 19 . 120 + =a.Graph this relationship by plugging in several values for fertilizer. b.Describehowtheshapeofthisrelationshipcompareswithalinear relationship between yield and fertilizer. Chapter 1 1.Suppose that you are asked to conduct a study to determine whether smaller class sizes lead to improved student performance of fourth graders. a.Ifyoucouldconductanyexperimentyouwant,whatwouldyoudo?Be specific. b.Morerealistically,supposeyoucancollectobservationaldataonseveral thousandfourthgradersinagivenstate.Youcanobtainthesizeoftheir fourth-grade class and a standardized test score taken at the end of fourth grade.Whymightyouexpectanegativecorrelationbetweenclasssize and test score? c.Wouldanegativecorrelationbasedonobservationaldatanecessarily show that smaller class sizes cause better performance? Explain. 2.A justification for job training programs is that they improve worker productivity. Suppose that you are asked to evaluate whether more job training makes workers more productive. However, rather than having data on individual workers, you have access to data on manufacturing firms in Ohio. In particular, for each firm, you have information on hours of job training per worker (training) and number of nondefective items produced per worker hour (output). a.Carefully state the ceteris paribus thought experiment underlying this policy question.3b.Does it seem likely that a firms decision to train its workers will be independent of worker characteristics? What are some of those measurable and unmeasurable worker characteristics?c.Name a factor other than worker characteristics that can affect worker productivity. d.If you find a positive correlation between output and training, would you have convincingly established that job training makes workers more productive? Explain. 3.Suppose at your university you are asked to find the relationship between weekly hours spent studying (study) and weekly hours spent working (work). Does it make sense to characterize the problem as inferring whether study causes work or work causes study? Explain. Computer exercises: C1.1Use the data in WAGE1 for this exercise. a.Find the average education level in the sample. What are the lowest and highest years of education? b.Find the average hourly wage in the sample. Does it seem high or low? c.The wage data are reported in 1976 dollars. Using the Economic Report of the President (2004 or later), obtain and report the Consumer Price Index (CPI) for the years 1976 and 2003. d.Use the CPI values from part (c) to find the average hourly wage in 2003 dollars. Now does the average hourly wage seem reasonable? e.How many women are in the sample? How many men? C1.2Use the data in BWGHT to answer this question. a.How many women are in the sample, and how many report smoking during pregnancy? b.What is the average number of cigarettes smoked per day? Isthe average a good measure of the typical woman in this case? Explain. c.Among women who smoked during pregnancy, what is the average number of cigarettes smoked per day? How does this compare with your answer from (b), and why? d.Find the average of fatheduc in the sample. Why are only 1,192 observations used to computer this average? e.Report the average family income and its standard deviation in dollars. C1.3The data in MEAP01 are for the state of Michigan in the year 2001. Use these data to answer the following questions.a.Find the largest and smallest values of math4.Does the range make sense? Explain.b.How many schools have a perfect pass rate on the math test? What percentage is this of the total sample? c.How many schools have math pass rates of exactly 50 percent? 4d.Compare the average pass rates for the math and reading scores. Which test is harder to pass? e.Find the correlation between math4 and read4. What do you conclude? f.The variable exppp is expenditure per pupil. Find the average of exppp along with its standard deviation. Would you say there is wide variation in per pupil spending? g.Suppose School A spends $6,000 per student and School B spends $5,500 per student. By what percentage does School As spending exceed School Bs? Comapre this to( ) ( ) 100 log 6000 log 5500 ( , which is the approximation percentage difference based on the difference in the natural logs. (See Section A.4 in Appendix A.) C1.4The data in JTRAIN2 come from a job training experiment conducted for low-income men during 1976-1977; see LaLonde (1986) a.Use the indicator variable train to determine the fraction of men receiving job training. b.The variable re78 is earnings from 1978, measured in thousands of 1982 dollars. Find the average of re78 for the sample of mean receiving job training and the sample not receiving job training. Is the difference economically large? c.The variable unem78 is an indicator of whether a man is unemployed? What about for men who did not receive job training? Comment on the difference.d.From parts (b) and (c), does it appear that the job training program was effective? What would make our conclusions more convincing? Chapter 2 1.Let kids denote the number of children ever born to a woman, and let educ denote yearsofeducationforthewoman.Asimplemodelrelatingfertilitytoyearsof education is u educ kids + + =1 0| | , where u is the unobserved error. a.Nameafewfactorsthatmaybecontainedinu.Aretheselikelytobe correlated with level of education? b.Willasimpleregressionanalysisuncovertheceterisparibuseffectof education on fertility? Explain. 2.Inthesimplelinearregressionmodel u x y + + =1 0| | ,supposethat 0 ) ( = u E . Letting ) (0u E = o ,showthatthemodelcanalwaysberewrittenwiththesame slope,butanewinterceptanderror,wherethenewerrorhasazeroexpected value. 53.ThefollowingtablecontainstheACTscoresandtheGPA(gradepointaverage) for eight college students. Grade point average is based on a four-point scale and has been rounded to one digit after the decimal. StudentGPAACT 12.821 23.424 33.026 43.527 53.629 63.025 72.725 83.730 a.Estimate the relationship between GPA and ACT using OLS, that is, obtain the intercept and slope estimates (formula: equation 2.19 and 2.17) 0 1 GPA ACT | | = +b.Computethefittedvaluesandresidualsforeachobservation,andverifythat the residuals (approximately) sum to zero. c.What is the predicted value of GPA when ACT = 20? d.HowmuchofthevariationinGPAfortheseeightstudentsisexplainedby ACT? Explain. 4.ThedatasetBWGHT.RAWcontainsdataonbirthstowomenintheUnited States. Two variables of interest are the dependent variable, infant birth weight in ounces(bwght)andanexplanatoryvariable,averagenumberofcigarettesthe mother smoked per day during pregnancy (cigs). The following simple regression was estimated using data on n = 1388 births: 119.77 0.514 bwght cigs = a.What is the predicted birth weight when cigs = 0? What about when cigs = 20 (one pack per day)? Comment on the difference. b.Doesthissimpleregressionnecessarilycaptureacausalrelationship betweenthechildsbirthweightandthemotherssmokinghabits? Explain. c.Topredictabirthweightof125ounces,whatwouldcigshavetobe? Comment. d.The proportion of women in the sample who do not smoke while pregnant is about 0.85. Does this help reconcile your finding from (c)? 5.In the linear consumption function60 1 cons inc | | = + the(estimated)marginalpropensitytoconsume(MPC)outofincomeissimplythe slope,1| , while the average propensity to consume (APC) is 1 0// | | + = inc inc ns o c . Usingobservationsfor100familiesonannualincomeandconsumption(both measured in dollars), the following equation is obtained: 124.84 0.853 cons inc = +692 . 0 , 1002= = R na.Interprettheinterceptinthisequation,andcommentonitssignand magnitude. b.What is the predicted consumption when family income is $30,000? c.With inc on the x-axis, draw a graph of the estimated MPC and APC starting at the annual income level of $1000. 6.Usingthedatafrom1988forhousessoldinAndover,Massachusetts,fromKiel andMcClain(1995),thefollowingequationrelateshousingprice(price)tothe distance from a recently built garbage incinerator (dist): log( ) 9.40 0.312log( ) price dist = +162 . 0 , 1352= = R na.Interpretthecoefficientonlog(dist).Isthesignofthisestimatewhatyou expect it to be? b.Doyouthinksimpleregressionprovidesanunbiasedestimatoroftheceteris praibus elasticity of price with respect to dist? (Think about the citys decision on where to put the incinerator.) 7.Forthepopulationoffirmsinthechemicalindustry,letrddenoteannual expenditures on research and development, and let sales denote annual sales (both areinmillionsofdollars).Writedownaregressionmodelthatusessalesto explain the variation in rd.Your model shall imply a constant elasticity between rd and sales. Which parameter is the elasticity? Computer Exercises C2.1 The data in 401K.RAW are a subset of data analyzed by Papke (1995) to study the relationship between participation in a 401(k) pension plan and the generosity of the plan. The variable prate is the percentage of eligible workers with an active account; this is the variablewewouldliketoexplain.Themeasureofgenerosityistheplanmatchrate, mrate. This variable gives the average amount the firm contributes to each workers plan foreach$1contributionbytheworker.Forexample,ifmrate=0.5,thena$1 contribution by the worker is matched by a 50 cents contribution by the firm. a.Find the average participation rate and the average match rate in the sample of plans. 7b.Read the STATA output of the following simple regression equationmrate te a pr1 0 | | + = , and report the results along with the sample size and R-squared. c.Interpret the intercept in your equation. Interpret the coefficient on mrate. d.Findthepredictedpratewhenmrate=3.5.Isthisareasonableprediction? Explain what is happening here. e.How much of the variation in prate is explained by mrate? C2.2ThedatasetinCEOSLA2.RAWcontainsinformationonchiefexecutiveofficers forU.S.corporations.Thevariablesalaryisannualcompensation,inthousandsof dollars, and ceoten is prior number of years as company CEO. a.Find the average salary and the average tenure in the sample. b.How many CEOs are in their first year as CEO (that is, ceoten = 0)? What is the longest tenure as a CEO? c.Read the STATA output of the following simple regression modelu ceoten salary + + =1 0) log( | |andwritedownthesampleregressionfunction.Whatisthe(approximate) predicted percentage increase in salary given one more year as a CEO? C2.3 Use the data in WAGE2.RAW to estimate a simple regression explaining monthly salary (wage) in terms of IQ score (IQ). a.FindtheaveragesalaryandaverageIQinthesample.Whatisthesample standarddeviationofIQ?(IQscoresarestandardizedsothattheaveragein the population is 100 with a standard deviation equal to 15.) b.Iestimatedasimpleregressionmodelwhereaone-pointincreaseinIQ changes wage by a constant dollar amount. Use the STATA output to find the predicted increase in wage for an increase in IQ of 15 points. Does IQ explain most of the variation in wage? c.Ithenestimatedamodelwhereeachone-pointincreaseinIQhasthesame percentageeffectonwage.IfIQincreasesby15points,whatisthe approximate percentage increase in predicted wage? Calculate the same effect without using the approximation and compare the two results. C2.4WeusedthedatainMEAP93.RAWforExample2.12.Letmath10denotethe percentage of tenth graders at a high school receiving a passing score on a standardized mathematics exam. Now we want to explore the relationship between the math pass rate (math10) and spending per student (expend). a.Do you think each additional dollar spent has the same effect on the pass rate, or does a diminishing effect seem more appropriate? Explain. b.I estimated the model 0 110 log( ) math expend u | | = + + . ReadtheSTATAoutputandwritedownthesampleregressionfunction, including the sample size and R-squared. c.Howbigistheestimatedspendingeffect?Namely,ifspendingincreasesby 10 percent, what is the estimated percentage point increase in math10? 8 Chapter 3 1.UsingthedatainGPA2.RAWon4,137collegestudents,thefollowingequation was estimated by OLS: 21.392 .0135 .001484,137, .273,colgpa hsperc satn R= += = where colgpa is measured on a four-point scale, hsperc is percentile in the high school graduating class (Defined so that, for example, hsperc=5 means top five percent of the class), and sat is the combined math and verbal scores on the student achievement test. a.Why does it make sense for the coefficient on hsperc to be negative? b.What is the predicted college GPA when hsperc = 20 and sat = 1050? c.Supposethattwohighschoolgraduates,AandB,graduatedinthesame percentilefromhighschool,butstudentAsSATscorewas140points higher (about one standard deviation in the sample). What is the predicted difference in college GPA for these two students? d.Holding hsperc fixed, what difference in SAT scores leads to a predicted colgpa difference of 0.50, or one-half of a grade point? 2.ThedatainWAGE2.RAWonworkingmenwasusedtoestimatethefollowing equation: 210.36 .094 .131 .210722, .214,educ sibs meduc feducn R= + += = Where educ is years of schooling, sibs is number of siblings, meduc is mothers years of schooling, and feduc is fathers years of schooling. a.Doessibshavetheexpectedeffect?Explain.Holdingmeducandfeduc fixed, by how much does sibs have to increase to reduce predicted years of education by one year? (A noninteger answer is acceptable here.) b.Discuss the interpretation of the coefficient on meduc. c.Suppose that Man A has no siblings, and his mother and father each have 12yearsofeducation.ManBhasnosiblings,andhismotherandfather each have 16 years of education. What is the predicted difference in years of education between B and A? 3.The median starting salary for new law school graduates is determined by ,) log(cos ) log( ) log(54 3 2 1 0u rankt libvol GPA LSAT salary+ ++ + + + =|| | | | | whereLSAT is the median LSAT score for the graduating class, GPA is the median college GPA for the class, libvol is the number of volumes in the law school library, cost is the annual cost of attending law school, and rank is a law school ranking (with rank=1 being the best). 9a.Explain why we expect. 05 s |b.Whatsignsdoyouexpectfortheotherslopeparameters?Justifyyour answers. c.Using the data in LAWSCH85.RAW, the estimated equation is 2log( ) 8.34 .0047 .248 .095log( ).038log(cos ) .0033136, .842salary LSAT GPA libvolt rankn R= + + ++ = = What is the predicted ceteris paribus difference in salary for schools with a median GPA different by one point? (Report your answer as a percentage.) d.Interpret the coefficient on the variable log(libvol). e.Would you say it is better to attend a better ranked law school? How much is a difference in ranking of 20 worth in terms of predicted starting salary? 4. In a study relating college grade point average to time spent in various activities, youdistributeasurveytoseveralstudents.Thestudentsareaskedhowmany hourstheyspendeachweekinfouractivities:studying,sleeping,working,and leisure. Any activity is put into one of the four categories, so that for each student, the sum of hours in the four activities must be 168. a.In the model ,4 3 2 1 0u leisure work sleep study GPA + + + + + = | | | | | does it make sense to hold sleep, work, and leisure fixed, while changing study? b.Explain why this model violates Assumption MLR.3. c.How could you reformulate the model so that its parameters have a useful interpretation and it satisfies Assumption MLR.3? 5.Considerthemultipleregressionmodelcontainingthreeindependentvariables, under Assumptions MLR.1 through MLR.4: .3 3 2 2 1 1 0u x x x y + + + + = | | | |You are interested in estimating the sum of the parameters on x1 and x1; call this .2 1 1| | u + =a.Show that 2 1 1 | | u + =is an unbiased estimator of.1ub.Find)(1u Varin terms of)(1| Var ,)(2| Varand),(2 1 | | corr . 6.Which of the following can cause OLS estimators to be biased? a.Heteroskedasticity. b.Omitting an important variable. c.A sample correlation coefficient of .95 between two independent variables both included in the model. 107.Supposethataverageworkerproductivityatmanufacturingfirms(avgprod) depends on two factors, average hours of training (avgtrain) and average worker ability (avgabil): .2 1 0u avgabil avgtrain avgprod + + + = | | |AssumethatthisequationsatisfiesMLR.1throughMLR.4.Ifgrantshavebeen given to firms whose workers have less than average ability, so that avgtrain and avgabil are negatively correlated, what is the likely bias in 1~|Obtained from the simpleregressionofavgprofonavgtrain?(usingoneofterminologiessuchas upward bias, downward bias, or biased toward zero). 8.Supposethatyouareinterestedinestimatingtheceterisparibusrelationship between y and x1. For this purpose, you can collect data on two control variables, x2 and x3. (For concreteness, you might think of y as final exam score, x1 as class attendance, x2 as GPA up to the previous semester, and x3 as SAT or ACT score.) Let 1~| bethesimpleregressionestimatefrom yonx1andlet 1| bethemultiple regression estimate from y on x1, x2, x3. a.Ifx1ishighlycorrelatedwithx2andx3inthesample,andx2anx3have largepartialeffectsony,wouldyouexpect 1~| and 1| tobesimilaror very different? Explain. b.Ifx1isalmostuncorrelatedwithx2andx3,butx2andx3arehighly correlated, will 1~|and 1|tend to be similar or very different? Explain. c.Ifx1ishighlycorrelatedwithx2andx3,andx2andx3havesmallpartial effects on y, would you expect)~(1| seor)(1| seto be smaller? Explain. d.If x1 is almost uncorrelated with x2 and x3, and x2 and x3 have large partial effects on y, and x2 and x3 are highly correlated, would you expect)~(1| seor)(1| seto be smaller? Explain. 9.Suppose the population model is0 1y x u | | = + +The key condition needed for OLS to consistently estimate the|is that the error term has mean zero and is uncorrelated with the regressor: ( ) ( ) 0, 0 Eu E xu = = . Showthanthezeroconditionalmeanassumption ( )Eu x isstrongerthanthe abovecondition.(actuallygiventhezeroconditionalmeanassumption,youcan show the error term is uncorrelated with any function ofx .) 10.Derivations related to OLS estimators a.Deriving OLS estimator for a simple regression (p.29) b.Show that y y =c.Show that 1 0ni iiu y==

d.Show thatSST SSE SSR = +(page 39) 11e.Partialling out interpretation of multiple regression Suppose the population regression is0 1 1 2 2...i i i k ik iy x x x u | | | | = + + + + +Claim: 1|from this multiple regression is equal to 1 from the following two steps (partialling out procedures) Step 1: regress 1 ixon 2,..., i ikx xwith an intercept to get the regression residual 1irStep 2: regress iyon 1irwith an intercept0 1i i iy r e = + + then we claim: 1 1 | = where ( )111211 =ni iiniiryr==| | |\ .| | |\ . According to (2.19) on page 29, for the simple regression in step 2, we have ( ) ( )( )1 11121 11 = ni iiniir r y yr r==| | |\ .| | |\ . Show that ( )( )( ) ( )1 1 11 12 21 1 11 1 n ni i i ii in ni ii ir r y y ryr r r= == =| | | | ||\ . \ .=| | | | ||\ . \ . (you need 11 0 thus0niir r== =) Show that 1 1 | =(Appendix 3A.2 on page 113) 11. Omitted variable bias in OLS estimators: Suppose the true population model is*0 1 1 2 2y x x u | | | = + + +We assume this model satisfies the assumption ( ) ( )1 2, 0 Eu x x Eu = = .Our primary interest is in 1| , the partial effect of 1xony .For example,yis hourly wage (or log of hourly wage), 1xis education, and *2xis innate ability. In order to get an unbiased estimator of 1| , we should run a regression ofyon 1xand *2x . However, *2xis not observed.If we regressyon 1xonly, the estimator of 1|12from this regression will suffer from omitted variable bias.Suppose ( )*2 1 0 1 1E x x x o o = + . Derive the bias in 1|from a simple regression ofyon 1xonly. Computer exercises C3.1 A problem of interest to health officials (and others) is to determine the effects of smoking during pregnancy on infant health. One measure of infant health is birth weight; a birth weight that is too low can put an infant at risk for contracting various illnesses. Since factors other than cigarette smoking that affect birth weight are likely to be correlated with smoking, we should take those factors into account. For example, higher income generally results in access to better parental care, as well as better nutrition for the mother. A regression model that recognizes that is0 1 2bwght cigs faminc u | | | = + + +where birth weight (bwght ) is in ounces, cigs is average number of cigarettes the mother smoked per day during pregnancy and family income (faminc) is in thousands. a.What is the most likely sign for2| ? b.Doyouthinkcigsandfamincarelikelytobecorrelated?Explainwhythe correlation might be positive or negative. c.Iestimatetheequationwithandwithoutfaminc,usingthedatain BWGHT.RAW.UseSTATAoutputtoreporttheresultsinequationform, includingthesamplesizeandR-squared.Discusstheresults,focusingon whetheraddingfamincsubstantiallychangestheestimatedeffectofcigson bwght. d.Interpret the coefficient of faminc in the multiple regression. Do you think this effect is practically large? C3.2 I use the data in HPRICE1.RAW to estimate the following model: u bdrms sqrft price + + + =2 1 0| | |where price is the house price measured in thousands of dollars, sqrft is square footage of the house and bdrms is number of bedrooms. a.Write out the sample regression function using the STATA output. b.Whatistheestimatedincreaseinpriceforahousewithonemorebedroom, holding square footage constant? c.What is the estimated increase in price for a house with an additional bedroom that is 140 square feet in size? Compare this to your answer in part (b). d.Whatpercentageofthevariationinpriceisexplainedbysquarefootageand number of bedrooms? e.Thefirsthouseinthesamplehassqrft=2,438andbdrms=4.Findthe predicted selling price for this house from the OLS regression line. f.Theactualsellingpriceofthefirsthouseinthesamplewas$300,000(so price=300).Findtheresidualforthishouse.Doesitsuggestthatthebuyer underpaid or overpaid for the house? 13C3.3 The file CEOSAL2.RAW contains data on 177 chief executive officers and can be usedtoexaminetheeffectsoffirmperformanceonCEOsalary.Thevariablesalaryis annualcompensation,inthousandsofdollars,ceotenispriornumberofyearsas company CEO, profits is firm profit in millions, mktval is firm market value in millions, sales is firm sales in millions.a.Iestimateamodelrelatingannualsalarytofirmsalesandmarketvalue makingthemodeloftheconstantelasticityvarietyforbothindependent variables. Write the SRF using the STATA output. b.ThenIaddprofitstothemodelin(a).Icannotincludethisvariablein logarithmic form, why? Would you say that these firm performance variables explain most of the variation in CEO salaries? c.SubsequentlyIaddthevariableceotentothemodelin(b).Whatisthe estimatedpercentagereturnforanotheryearofCEOtenure,holdingother factors fixed? d.Find the sample correlation coefficient between the variables log(mktval) and profits.Arethesevariableshighlycorrelated?Whatdoesthissayaboutthe OLS estimators? C3.4 The data in ATTEND.RAW are used for this exercise. a.Report the minimum, maximum, and average values for the variables atndrte, priGPA, and ACT. b.I estimate the model ,2 1 0u ACT priGPA atndrte + + + = | | |Write the SRF using the STATA output. Interpret the intercept. Does it have a useful meaning? c.Discuss the estimated slope coefficients. Are there any surprises? d.What is the predicted atndrte if priGPA = 3.65 and ACT = 20? What do you think of this result?e.If Student A has priGPA = 3.1 and ACT = 21 and Student B has priGPA = 2.1 and ACT = 26, what is the predicted difference in their attendance rates?

C3.5 The data set in WAGE2.RAW is used for this problem.First I run a simple regression of IQ on educ to obtain the slope coefficient, say,.~1oThen Irunthesimpleregressionoflog(wage)oneduc,andobtaintheslopecoefficient, 1~| . SubsequentlyIrunthemultipleregressionoflog(wage)on educandIQ,andobtainthe slope coefficients, 1|and 2| , respectively. Based the above regression results verify that 1 2 1 1~ ~o | | | + = . C3.6 The data in MEAP93.RAW are used to estimate the following regression.a.I estimate the model 0 1 210 log( ) , math expend lnchprg u | | | = + + +Report the SRF, including the sample size and R-squared.b.What do you make of the intercept (a)? In particular, does it make sense to set the two explanatory variables to zero? [Hint: Recall that log(1)=0.] 14c.Now I run the simple regression of math10 on log(expend), and compare the slopecoefficientwiththeestimateobtainedin(a).Istheestimatedspending effect now larger or smaller than in (a)? d.Reportthecorrelationbetweenlexpend=log(expend)andlnchprg.Doesits sign make sense to you? e.Use (d) to explain your findings in (c). C3.7 I Use the data in DISCRIM.RAW for this question. These are zip code-level data on prices for various items at fast-food restaurants, along with characteristics of the zip code population,inNewJerseyandPennsylvania.Theideaistoseewhetherfast-food restaurants charge higher prices in areas with a larger concentration of blacks. a.Reportthesamplemeanofprpblckandincome,alongwiththeirstandard deviations. Can you deduce the units of measurement of prpblck and income? b.Consideramodeltoexplainthepriceofsoda,psoda,intermsofthe proportion of the population that is black and median income: u income prpblck psoda + + + =2 1 0| | |ReporttheSRF,includingthesamplesizeandR-squared.Interpretthe coefficient on prpblck.Do you think the effect ofprpblck on price of soda is economicallylarge(Comparingtwohypotheticalcommunities,onewith 100% white and the other with 100% black)? c.Comparetheestimatefrom(b)withthesimpleregressionestimatefrom psodaandprpblack.Isthediscriminationeffectlargerorsmallerwhenyou control for income?d.Amodelwithconstantpriceelasticitywithrespecttoincomemaybemore appropriate. Report estimates of the model ( )0 1 2log( ) log psoda prpblck income u | | | = + + +Ifprpblckincreasesby.20(20percentagepoints),whatistheestimated percentage change in psoda?e.Nowaddthevariableprppovtotheregressionin(d).Whathappensto prpblck|? f.Reportthecorrelationbetweenlog(income)andprppov.Isitroughlywhat you expected? g.Evaluatethefollowingstatement:Becauselog(income)andprppovareso highly correlated, they have no business being in the same regression. Chapter 4 1.ConsideranequationtoexplainsalariesofCEOsintermsofannualfirmsales, returnonequity(roe,inpercentage),andreturnonthefirmsstock(ros,in percentage): . ) log( ) log(3 2 1 0u ros roe sales salary + + + + = | | | |a.Statethenullhypothesisthat,aftercontrollingforsalesandroe,roshas noeffectonCEOsalary.Statethealternativethatbetterstockmarket performance (higher ros) increases a CEOs salary. 15b.UsingthedatainCEOSAL1.RAW,thefollowingSRFwasobtainedby OLS: 2log( ) 4.32 .280 log( ) .0174 .00024(.32) (.035) (.0041) (.00054)209, .283.salary sales roe rosn R= + + += = What is the effect of ros on the predicted salary if ros increases by 50 percentage points? Does ros have a practically large effect on salary? c.Testthenullhypothesisthatroshasnoeffectonsalaryagainstthe alternativethatroshasapositiveeffect.Carryoutthetestatthe10% significance level. d.Would you include ros in a final model explaining CEO compensation in terms of firm performance? Explain. 2.Thevariablerdintensisexpendituresonresearchanddevelopment(R&D)asa percentageofsales.Salesaremeasuredinmillionsofdollars.Thevariable profmarg is profits as a percentage of sales. UsingthedatainRDCHEM.RAWfor32firmsinthechemicalindustry,the following equation is estimated: 2.472 .321log( ) .050(1.369)(.216) (.046)32, .099.rdintens sales profmargn R= + += = a.Interpret the coefficient on log(sales). In particular, if sales increases by 10%, what is the estimatedeffect on rdintens? It this an economically large effect? b.Test the hypothesis that R&D intensity does not change with sales against the alternativethatitdoesincreasewithsales.Dothetestatthe5%and10% levels. c.Interpret the coefficient on profmarg. Is it economically large? d.Does profmarg have a statistically significant effect on rdintens? 3.Are rent rates influenced by the student population in a college town? Let rent be theaveragemonthlyrentpaidonrentalunitsinacollegetownintheUnited States.Letpopdenotethetotalcitypopulation,avginctheaveragecityincome, andpctstuthestudentpopulationasapercentageofthetotalpopulation.One modeltotestforarelationshipbetweenrentratesandpercentageofstudentsin overall population is 0 1 2 3log( ) log( ) log( ) . rent pop avginc pctstu u | | | | = + + + +a.Statethenullhypothesisthatsizeofthestudentbodyrelativetothe populationhasnoceterisparibuseffectonmonthlyrents.Statethe alternative that there is an effect. b.What signs do you expect for 1|and 2| ? c.Theequationestimatedusing1990datafromRENTAL.RAWfor64 college towns is 162log( ) .043 .066log( ) .507log( ) .0056(.844) (.039)(.081)(.0017)64, .458.rent pop avginc pctstun R= + + += = Whatiswrongwiththestatement:A10%increaseinpopulationis associated with about a 6.6% increase in rent? d.Test the hypothesis stated in (a) at the 1% level. 4.ConsidertheestimatedequationfromExample4.3,whichcanbeusedtostudy the effect of skipping class on college GPA: 21.39 .412 .015 .083(.33) (.094) (.011) (.026)141, .234colGPA hsGPA ACT skippedn R= + + = = a.Find the 95% confidence interval for hsGPA| . b.Canyourejectthenullhypothesis4 . :0=hsGPAH | againstthetwo-sided alternative at the 5% level? c.Can you reject the null hypothesis1 :0=hsGPAH |against the two-sided alternative at the 5% level? 5.Insection4.5,weusedasanexampletestingtherationalityofassessmentsof housing prices. There, we used a log-log model in price and assess [see equation (4.47)]. Here, we use a level-level specification. a.In the simple regression model ,1 0u assess price + + = | |the assessment is rational if11 = |and00 = | . The estimated equation is 214.47 .976(16.27)(.049)88, 165, 644.51, .820price assessn SSR R= += = = First,testthehypothesisthat0 :0 0= | H againstatwo-sided alternative. Then, test1 :1 0= | Hagainst a two-sided alternative. What do you conclude? b.To test the joint hypothesis that00 = |and11 = | , we need the SSR in the restricted model. This amounts to computing =nii iassess price12) ( , where n=88,sincetheresidualsintherestrictedmodelarejustpriceiassesi. (No estimation is needed for the restricted model because both parameters arespecifiedunderH0.)ThisturnsouttoyieldSSR=209,448.99.Carry out the F test for the joint hypothesis. Is the null hypothesis rejected at the 1% level? 17 c.Now, test0 :2 0= | H ,03 = | , and04 = |in the model .4 3 2 1 0u bdrms sqrft lotsize assess price + + + + + = | | | | |The R-squared from estimating this model using the same 88 houses is .829. Can we reject the null hypothesis at the 10% level? 6.Considerthemultipleregressionmodelwiththreeindependentvariables,under the classical linear model assumptions MLR.1 through MLR.6: 0 1 1 2 2 3 3. y x x x u | | | | = + + + +You would like to test the null hypothesis. 1 3 :2 1 0= | | Ha.Let 1| and 2| denotetheOLSestimatorsof 1| and 2| .Find )3(2 1| | Varin terms of the variances of1|and 2|and the covariance between them. What is the standard error of)3(2 1| | ? b.Write the t statistic for testing1 3 :2 1 0= | | H . c.Define 2 1 13| | u = and 2 1 13 | | u = .Writearegressionequation involving 0| , 1u , 2|and 3| that allows you to directly obtain 1uand its standard error. 7.The following table was created based on results from three regressions using the data in CEOSAL2.RAW: Dependent Variable: log(salary) Independent Variables(1)(2)(3) log(sales) .224 (.027) .158 (.040) .188 (.040) log(mktval)_______ .112 (.050) .100 (.049) profmarg_______ .0023 (.0022) .0022 (.0021) ceoten______________ .0171 (.0055) comten______________ .0092 (.0033) intercept 4.94 (0.20) 4.62 (0.25) 4.57 (0.25) Observations R-squared 177 .281 177 .304 177 .353 18The variable mktval is market value of the firm, profmarg is the profit as a percentage of sales, ceoten is years as CEO with the current company, and comten is total years with the company. a.CommentontheeffectofprofmargonCEOsalarybasedonthesecondand third regressions in the table. b.Based on the third regression in the table, does market value have a significant effect in a two-sided test? Explain. c.Interpret the coefficients on ceoten and comten in the third regression. Are the variables statistically significant for a two-sided test at the 5% level? d.Whatdoyoumakeofthefactthatlongertenurewiththecompany,holding the other factors fixed, is associated with a lower salary? Computer exercises C4.1Thefollowingmodelcanbeusedtostudywhethercampaignexpendituresaffect election outcomes: 0 1 2 3log( ) log( ) voteA expendA expendB prtystrA u | | | | = + + + +where voteA is the percentage of the vote received by Candidate A, expendA and expendB arecampaignexpendituresbyCandidatesAandB,andprtystrAisameasureofparty strength for Candidate A (the percentage of the most recent presidential vote that went to As party). a.What is the interpretation of 1| ? b.In terms of the parameters, state the null hypothesis that a 1% increase in As expenditures is offset by a 1% increase in Bs expenditures. c.IestimatethegivenmodelusingthedatainVOTE1.RAW.ReporttheSRF withstandarderrorsinparentheses.IsAsexpendituresvariablestatistically significant? What about Bs expenditures? Can you use these results to test the hypothesis in (b)? d.Writedownthemodelthatdirectlygivesthetstatisticfortestingthe hypothesis in (b). C4.2 Use the data in LAWSCH85.RAW for this exercise. a.Using the same model as problem 3 of chapter 3, state the null hypothesis that the rank of law schools has no ceteris paribus effect on median starting salary and a one-sided alternative hypothesis. b.Based on the STATA output, interpret the rank coefficient. Can you reject the null hypothesis in a) at the 5% level? c.Are features of the incoming class of students, LSAT and GPA, individually or jointly significant for explaining salary? (to account for missing data on LSAT and GPA, I estimated the restricted model using individuals only if their LSAT and GPA are not missing.) d.Testwhetherthesizeoftheenteringclass(clsize)orthesizeofthefaculty (faculty) needs to be added to this equation by carrying out a single test at the 5% level. (Again I accounted for missing data on clsize and faculty.) 19C4.3 Use the data in MLB1.RAW for this exercise. a.Iestimatethemodelinequation(4.31)anddropthevariablerbisyr.What happenstothestatisticalsignificanceofhrunsyr?Whataboutthesizeofthe coefficient on hrunsyr? b.Ithenaddthevariablesrunsyr(runsperyear),fldperc(fieldingpercentage), andsbasesyr(stolenbasesperyear)tothemodelin(a).Whichofthese factors are individually significant? Interpret the significant coefficient(s). c.In the model in (b), test the joint significance of bavg, fldperc, and sbasesyr. C4.4Use the data in WAGE2.RAW for this exercise. a.Consider the standard wage equation 0 1 2 3log( ) . wage educ exper tenure u | | | | = + + + +Statethenullhypothesisthatanotheryearofgeneralworkforceexperience hasthesameeffectonlog(wage)asantheryearoftenurewiththecurrent employer. b.Testthenullhypothesisin(a)againstatwo-sidedalternative,atthe5% significancelevel,byconstructinga95%confidenceinterval.Whatdoyou conclude? C4.5 Refer to example used in Section 4.4. I will use the data set TWOYEAR.RAW. a.The variable phsrank is the persons high school percentile. (A larger number is better. For example, 90 means you are ranked better than 90 percent of your graduatingclass.)Findthesmallest,largest,andaveragephsrankinthe sample. b.I then add phsrank to equation (4.26) and estimate the new model. Report the OLSestimatesintheusualform.Isphsrankstatisticallysignificant?How much is 10 percentage points of high school rank worth in terms of wage? c.Doesaddingphsrankto(4.26)substantivelychangetheconclusionsonthe returns to two- and four-year colleges? Explain. C4.6UsethedatainDISCRIM.RAWtoanswerthisequation.(SeealsoComputer Exercise C3.7 in Chapter 3.) a.I estimate the model using STATA , ) log( ) log(3 2 1 0u prppov income prpblck psoda + + + + = | | | |ReporttheSRFwithstandarderrors,numberofobservationand 2R .Is 1|statisticallydifferentfromzeroatthe5%levelagainstatwo-sided alternative? What about at the 1% level? b.What is the correlation between log(income) and prppov? For both variables, report the t statistics and two-sided p-values. c.Totheregressionin(a),addthevariablelog(hseval)(hsevalis median housing value at zipcode level). Interpret its coefficient and report the two-sided p-value for0 :) log(=hseval oH | . d.In the regression in (c), what happens to the individual statistical significance of log(income) and prppov? Are these variables jointly significant? (Compute a p-value.) What do you make of your answers? 20e.Given the results of the previous regressions, which one would you report as mostreliableindeterminingwhethertheracialmakeupofazipcode influenceslocalfast-foodprices?Whatistheeffectofprpblckonpriceof soda based on the model you picked as the most reliable? C4.7 Use the data in HPRICE1.dta to answer this question.We set a population model ( )0 1 2log price sqrft bdrms u | | | = + + + a.You are interested in estimating and obtaining a confidence interval for the percentage change in price when a 150-square-foot bedroom is added to a house. In decimal form, this is 1 1 2150 u | | = + . Use the data to estimate 1u . b.Write 2|in terms of 1uand 1|and plus this into the regression equation above. c.Use the new regression you get in b) to obtain a standard error for 1uand use this standard error to construct a 95% confidence interval. Chapter 5 Computer exercises C5.1 Use the data in WAGE1.dta for this exercise. a.Estimate the equation 0 1 2 3wage educ exper tenure u | | | | = + + + +Save the residuals and plot a histogram. b.Repeat part (a), but with( ) log wageas the dependent variable. c.WouldyousaythatAssumptionMLR.6isclosertobeingsatisfiedforthe level-level model or the log-level model? C5.2 Use the data in GPA2.dta for this exercise. a.Using all 4,137 observations, estimate the equation 0 1 2lg co pa hsperc sat u | | | = + + +and report the results b.Reestimate the equation in part (a), using the first 2,070 observations. c.Find the ratio of the standard errors on hsperc from parts (a) and (b). Compare this with the result from equation (5.10) in the book. Chapter 6 1.The following SRF was estimated using the data in CEOSAL.RAW: 22log( ) 4.322 .276 log( ) .0215 .00008(.324) (.033) (.0129) (.00026)209, .282.salary sales roe roen R= + + = = 21Thismodelallowsroetohaveadiminishingeffectonlog(salary).Isthis generality necessary? Explain why or why not. 2.Let o|, 1| , , k|be the OLS estimates from the regression of yi on xi1, , xik, i=1,2,,n.Fornonzeroconstantsc0, c1,,ck,arguethattheOLSintercept and slopes from the regression of c0 yi on c1 xi1, , ck xik, i = 1, 2, , n, are given by o o oc | |~= ,k k o k oc c c c | | | |) / (~..., ,) / (~1 1 1= = . (Hint:Usethefactthatthe j|solvethefirstorderconditionsin(3.13),andthe j|~mustsolvethefirstorderconditionsinvolvingtherescaleddependentand independent variables.) 3.Using the data in RDCHEM.dta, the following equation was obtained by OLS: 222.613 0.00030 0.0000000070(0.429)(0.00014) (0.0000000037)32 .1484rdintens sales salesn R= + = = a.Atwhatpointdoesthemarginaleffectofsalesonrdintensbecome negative? b.Would you keep the quadratic term in the model? Explain. c.Definesalesbilassalesmeasuredinbillionsofdollars:salesbil= sales/1,000.Rewrite(withoutre-estimatingthemodel)theestimated equation with salesbil and 2salesbilas the independent variables. Be sure to report standard errors and the R-squared.d.For the purpose of reporting the result, which equation do you prefer? 4.Thefollowingmodelallowsthereturntoeducationtodependuponthetotal amount of both parents education, called pareduc: . exp . ) log(4 3 2 1 0u tenure er pareduc educ educ wage + + + + + = | | | | |a.Using calculus to show that the return to another year of education in this model is roughly 1 2log( ) / . wage educ pareduc | | A A = +What sign do you expect for 2| ? Why? b.Using the data in WAGE2.RAW, the estimated equation is 2log( ) 5.65 .047 .00078 .(.13) (.010) (.00021).019exp .010(.004) (.003)722, .169wage educ educ pareducer tenuren R= + + ++= = (Only722observationscontainfullinformationonparentseducation.) Interpretthecoefficientontheinteractionterm.Itmighthelptochoosetwo 22specificvaluesforpareduc,forexample,pareduc=32ifbothparentshavea collegeeducation,orpareduc=24ifbothparentshaveahighschool education, and to compare the estimated return to educ. c.When pareduc is added as a separate variable to the equation, we get: 2log( ) 4.94 .097 0.033 0.0016 .(.38) (.027)(.017) (.0012).020exp .010(.004) (.003)722, .174wage educ pareduc educ pareducer tenuren R= + + ++= = Does the estimated return to education now depend positively on parent education? Test the null hypothesis that the return to education does not depend on parent education. 5.In example 4.2, where the percentage of students receiving a passing score on a tenth-grade math exam (math10) is the dependent variable, does it make sense to include sci10 the percentage of tenth graders passing a science exam as an additional explanatory variable? 6.When 2atndrteandACT atndrte are added to the equation estimated in (6.19), the R-squared becomes 0.232. Are these additional terms jointly significant at the 10% level? Would you include them in the model? 7.Suppose we want to estimate the effects of alcohol consumption (alcohol) on colleage grade point average (colGPA). In addition to collecting information on grade point average and alcohol usage, we also obtain attendance information (say, percentage of lectures attended, called attend). A standardized test score (say, SAT) and high school GPA (hsGPA) are also available. a.Should we include attend along with alcohol as explanatory variables in a multiple regression model? (think about how you would interpret alcohol| .) b.Should SAT and hsGPA be included as explanatory variables? Explain. Computer exercises C6.1 I use the data in KEILMC.RAW for the year 1981 to run the following regressions. Thedataareforhousesthatsoldduring1981inNorthAndover,Massachusetts;1981 was the year construction began on a local garbage incinerator. a.To study the effects of the incinerator location on housing price, consider the simple regression model , ) log( ) log(1 0u dist price + + = | |where price is housing price in dollars anddist is distance from the house to the incinerator measured in feet. Interpreting this equation casually, what sign 23doyouexpectfor 1| ifthepresenceoftheincineratordepresseshousing prices?b.I estimate this simple equation. Report the regression results and interpret the results. c.To the simple regression model in (a), I add the variables log(intst), log(area), log(land), rooms, baths, and age, where intst is distance from the home to the interstate(highway)measuredinfeet,areaissquarefootageofthehouse, landisthelotsizeinsquarefeet,roomsistotalnumberofrooms,bathsis number of bathrooms, and age is age of the house in years. Now, what do you conclude about the effects of the incinerator?d.NextIadd 2[log( )] intst tothemodelfromc).Nowwhathappens?Whatdo you conclude about the importance of functional form? e.Is the square of log(dist) significant when I add it to the model in d)? C6.2 I use the data in WAGE1.RAW for this exercise. a.I estimate the equation 20 1 2 3log( ) , wage educ exper exper u | | | | = + + + +Report the results using the usual format. b.Is exper2 statistically significant at the 1% level? c.Findthereturntothefifthyearofexperience.Whatisthereturntothe twentieth year of experience? (not using approximations) d.Atwhatvalueofexperdoesadditionalexperienceactuallylowerpredicted log(wage)? How many people have more experience in this sample? C6.3 Consider a model where the return to education depends upon the amount of work experience (and vice versa): 0 1 2 3log( ) . . wage educ exper educ exper u | | | | = + + + +a.Showthatthereturntoanotheryearofeducation,holdingexperfixed,is 1 3exper | | + . b.Statethenullhypothesisthatthereturntoeducationdoesnotdependonthe level of exper. What do you think is the appropriate alternative? c.Test the null hypothesis in (b) against your stated alternative. d.Let 1u denotethereturntoeducation.Writedownthemodelthatdirectly gives the estimate and standard error for 1u . C6.4 Use the housing price data in HPRICE1.dta for this exercise. a.Estimate the model ( ) ( ) ( )0 1 2 3log log log price lotsize sqrft bdrms u | | | | = + + + +and report the results in the usual OLS format (as on page 154) b.Find the predicted value of log(price), when20, 000 lotsize = ,2,500 sqrt = , and 4 bdrms = . Using the method of equation (6.43), find the predicted value of price at the same values of the explanatory variables. 24C6.5 Use the data in VOTE1.dta for this exercise. a.Consider a model with an interaction between expenditures: 0 1 2 3 4exp exp voteA prtystrA expendA endB expendA endB u | | | | | = + + + + +What is the partial effect of expendB on voteA, holding prtystrA and expendA fixed? What is the partial effect of expendA on voteA? Is the expected sign for 4|obvious? b.Estimate the equation in a) and report the results in the usual form. Is the interaction term statistically significant? c.Find the average of expendA in the sample. Fix expendA at 300 (for $300,000). What is the estimated effect of another $100,000 spent by Candidate B on voteA? Is this a large effect? d.Now fix expendB at 100. What is the estimated effect of100 expendA A =on voteA? Is this a large effect? e.Now, estimate a model that replaces the interaction with shareA, Candidate As percentage share of total campaign expenditures. Does it make sense to hold both expendA and expendB fixed, while changing shareA? f.In the model from e), find the partial effect of expendB on voteA, holding prtystrA and expendA fixed. Evaluate this at expendA = 300 and expendB = 0 and comment on the results. C6.6 Use the data in ATTEND.dta for this exercise. a.Give the population regression function in Example 6.3, we have 2 4 62stndfnlpriGPA atndrtepriGPA| | |c= + +c Use equation (6.19) to estimate the partial effect when2.59 priGPA =and 82 atndrte = . Interpret your estimate. b.Reparameterize the model to capture the above effect by a single parameter and estimate the reparameterized model. ( )( )220 1 2 3 4 562.5982stndfnl atndrte priGPA ACT priGPA ACTpriGPA atndrte uu u u u u uu= + + + + + + +Where( ) ( )2 2 4 62 2.59 82 u | | | = + + . (Note that the intercept has changed, but this is not important.) Use this to obtain the standard error of 2u .Is it statistically significant? C6.7Use the data in HPRICE1.dta for this exercise. a.Estimate the model 0 1 2 3price lotsize sqrft bdrms u | | | | = + + + +and report the results in the usual form, including the standard error of the regression. Obtain predicted price, when we plug in10, 000 lotsize = , 2300 sqrft = , and4 bdrms = ; round this price to the nearest dollar. b.Run a regression that allows you to put a 95% confidence interval around the predicted value in a). Note that your prediction will differ somewhat due to rounding error. 25Chapter 7 1.In example 7.2, let noPC be a dummy variable equal to one if the student does not own a PC, and zero otherwise. a.If noPC is used in place of PC in equation 7.6, what happen to the intercept in the estimated equation? What will be the coefficient on noPC? b.What will happen to the R-squared if noPC is used in place of PC? c.Should PC and noPC both be included as independent variable in the model? Explain. 2. Suppose you collect data from a survey on wages, education, and gender. In addition, you ask for information about marijuana usage. The original question is: On how many separate occasions last month did you smoke marijuana? a.Write an equation that would allow you to estimate the effects of marijuana usage on wage, while controlling for other factors. You should be able to make statement such as, Smoking marijuana five more times per month is estimated to change wage by% x . b.Write a model that would allow you to test whether drug usage has different effects on wages for men and women.How could you test that there are no differences in the effects of drug usage for mean and women? c.Suppose you think it is better to measure marijuana usage by putting pople into one of four categories: nonuser, light user (1 to 5 times per month), moderate user (6 to 10 times per month), and heavy user (more than 10 times per month). Now write a model that allows you to estimate the effects of marijuana usage on wage. d.Using the model in c), explain in detail how to test the null hypothesis that marijuana usage has no effect on wage. Be very specific and include a careful listing of degrees of freedom. e.What are some potential problems with drawing causal inference using the survey data that you collected? Computer Exercises C 7.1Use the data in WAGE2.dta for this exercise a.Estimate the model 0 1 2 3 45 6 7log( ).wage educ exper tenure marriedblack south urban u| | | | || | |= + + + ++ + + + and report the results in the usual form. Holding other factors fixed, what is the approximate difference in monthly salary between blacks and non-blacks? Is this difference statistically significant? b.Expand the model in a) to allow the return to education to depend on race and test whether the return to education does depend on race. c.Again, start with the model in a), but now allow wages to differ across four groups of people: married and black, married and nonblack, single and black, and single and nonblack. What is the estimated wage differential between married blacks and married nonblacks? C 7.2Use the data in GPA2.dta for this exercise a.Consider the equation 2620 1 2 3 45 6.colgpa hsize hsize hsperc satfemale athlete u| | | | || |= + + + ++ + + where colgpa is cumulative college grade point average, hsize is size of high school graduating class, in hundreds, hsperc is academic percentile in graduating class, sat is combined SAT score, female is a binary gender variable, and athlete is a binary variable, which is one for student-athletes. What are your expectations for the coefficients in this equation? Which ones are you unsure about? b.Estimate the equation in a) and report the results in the usual form. What is the estimated GPA differential between athletes and nonathletes? Is it statistically significant? c.Drop sat from the model and reestimate the equation. Now what is the estimated effect of being an athlete? Discuss why the estimate is different than that obtained in b). d.In the model from a), allow the effect of being an athlete to differ by gender and test the null hypotheses that there is no ceteris paribus difference between women athletes and women nonathletes.e.Does the effect of sat on colgpa differ by gender? Justify your answer. Chapter 8 Computer Exercises C 8.1 a.Use the data in HPRICE1.dta to obtain the heteroskedasticity-robust standard errors for equation (8.17). discuss any important differences with the usual standard errors. b.Repeat a) for equation (8.18). c.What does this example suggest about heteroskedasticity and the transformation used for the dependent variable? Chapter 9 Computer Exercises C9.1Let math10 denote the percentage of students at a Michigan high school reveiving a passing score on a standardized math test (see also Example 4.2).We are interested in estimating the effect of per student spending on math performance. A simple model is( ) ( )0 1 2 3log log math10 expend enroll poverty u | | | | = + + + +Where poverty is the percentage of students living in poverty. a.The variable lnchprg is the percentage of students eligible for the federally funded school lunch program. Why is this a sensible proxy variable for poverty? b.Estimate the model with and without lnchprg as an explanatory variable and report your regression results. Compare the effect of expenditures on math10 from both regressions.27c.Does it appear that pass rates are lower at larger schools, other factors being equal? Explain. d.Interpret the coefficient of lnchprg. e.What do you make of the substantial increase in 2Rafter adding lnchprg? C 9.2Use the data set WAGE2.dta for this exercise. a.Use the variable KWW (the knowledge of the world of work test score) as a proxy variable for ability in place of IQ in Example 9.3. What is the estimated return to education? b.Now, use IQ and KWW together as proxy variables. What happens to the estimated return to education? c.In b), are IQ and KWW individually significant? Are they jointly significant? C 9.3Use the data from JTRAIN.dta for this exercise. a.Consider the simple regression model ( )0 1log scrap grant u | | = + +where scrap is the firm scrap rate and grant is a dummy variable indicating whether a firm received a jobtraining grant. Can you think of some reasons why the unobserved factor in u might be correlated with grant? b.Estimate the simple regression model using the data for 1988. (you should have 54 observations.) Does receiving a job training grant significantly lower a firms scrap rate? c.Now, add as an explanatory variable( )87log scrap . How does this change the estimated effect of grant? Interpret the coefficient on grant. Is it statistically significant at the 5% level against the one-sided alternative : 0a grantH | < ? d.Test the null hypothesis that the parameter on( )87log scrapone against the two-sided alternative. Report the p-value of the test. e.Repeat c) and d), using heterskedasticity-robust standard errors, and briefly discuss any notable differences. C 9.4You need to use two data sets for this exercise JTRAIN2.dta and JTRAIN3.dta. (Before solving this problem, read the data dictionary regarding both data sets). The former is data from a job training experiment, where job training was assigned by randomization.The latter contains observational data (a random sample from the population of (American) men working in 1978.), where job training participation was largely determined by individual choice.The two data sets cover the same time period. a.In the data set JTRAIN2.dta, what fraction of the men received job training? What is the fraction in JTRAIN3.dta? Why do you think there is such a big difference? b.Using JTRAIN2.dta, run a simple regression of re78 on train. What is the estimated effect of participating in job training on real earnings? 28c.Now add as controls to the regression in b) the variables re74, re75, educ, age, black, and hisp. Does the estimated effect of job training on re78 change much? How come? d.Do the regression in b) and c) using the data in JTRAIN3.dta, reporting only the estimated coefficients on train, along with their t statistics. What is the effect now of controlling for the extra factors, and why? e.Define( ) 74 75 2 avgre re re = + . Find the sample averages, standard deviations, and minimum and maximum values in the two data sets. Are these data sets representative of the same populations in 1978? f.Almost 96% of men in the data set JTRAIN2.dta have avgre less than $10,000. Using only these men, run the regression re78 on train, re74, re75, educ, age, black, hisp and report the training estimate and its t statistic. Run the same regression for JTRAIN3.dta, using only men with avgre less than $10,000. For the subsample of low-income men, how do the estimated training effects compare across the experimental and nonexperimental data sets? g.Now use each data set to run the simple regression re78 on train, but only for men who were unemployed in 1974 and 1975. How do the training estimates compare now? If you fine the estimate from the observational data is higher than that from the experiment data, can you think of an explanation?h.Using your findings from the previous regressions, discuss the potential importance of having comparable populations underlying comparisons of experimental and nonexperimental estimates. Chapter 13 1.In example 13.1, assume that the average of all factors other than educ have remained constant over time and that the average level of education is 12.2 for the 1972 sample and 13.3 in the 1984 sample. Using the estimates in Table 13.1, find the estimated change in average fertility between 1972 and 1984. (Be sure to account for the intercept change and the change in average education.) 2.Using the data in KIELMC.dta, the following two equations were estimated using the years 1978 and 1981: ( )2log 11.49 .547 .394 81(.26) (.058) (.080)321, .220price nearinc y nearincn R= + = = ( )2log 11.18 .563 81 .403 81(.27) (.044) (.067)321, .337price y y nearincn R= + = = 29The estimates on the interaction term81 y nearinc from the above two equations are very different from that in equation (13.9).Explain the difference between these two regressions and equation (13.9).3.Suppose we want to estimate the effect of several variables on annual saving and that we have a panel data set on individuals collected on January 31, 1990, and January 31, 1992. If we include a year dummy for 1992 and use first differencing, can we also include age in the original model (the model before differencing)? Explain. Computer Exercises C13.1 Use the data in FERTIL1.data for this exercise. a.In the equation estimated in Example 13.1, test whether living environment at age 16 has an effect on fertility. (the base group is large city.) Report the value of the F statistic and the p-value. b.Test whether region of the country at age 16 (South is the base group) has an effect on fertility. c.Add the interaction terms74 y educ ,76 y educ ,, and84 y educ to the model estimated in Table 13.1. Explain what these terms represent. Are they jointly significant? d.Based on the SRF you got in c), find out the relative fertility level of 1984 compared to the base year 1972 for 12 years of education and at the sample mean of education in 1984. Explain that how we know if the above two estimates are significant, and you only need to suggest a regression to run for each situation (educ = 12 and educ at the sample mean of 1984)? C13.2 Use the data in CPS78_85.dat for this exercise.a.How do you interpret the coefficient on85 yin equation (13.2)? Does it have an interesting interpretation? (Be careful here; you must account for the interaction terms85 y educ and85 y female .) b.Holding other factors fixed, what is the estimated percent increase in nominal wage for a male with 12 years of education over this time period? Propose a regression to obtain a confidence interval for this estimate.c.Reestimate equation (13.2) but let all wages be measured in 1978 dollars. In particular, define the real wage as rwage = wage for 1978 and as rwage = wage/1.65 for 1985. Now use( ) logrwagein place of( ) log wagein estimating (13.2). Before running the regression, try to predict which coefficients will differ from those in equation (13.2). d.Explain why the 2Rfrom your regression in c) is not the same as in equation (13.2).e.Describe how union participation changed from 1978 to 1985. f.Starting with equation (13.2), test whether the union wage differential changed over time.30g.Do your findings in e) and f) conflict? Explain. C 13.3 Use the data in KIELMC.dta for this exercise a.The variable dist is the distance from each home to the incinerator site, in feet. Consider the model ( ) ( ) ( )0 0 1 1log 81 log 81 log price y dist y dist u | o | o = + + + +If building the incinerator reduces the value of homes closer to the site, what is the sign of 1o ? What does it mean if 10 | > ? b.Estimate the model in a) and report the results in the usual form. Interpret the coefficient on( ) 81 log y dist . What do you conclude? c.Add age, 2age , rooms, baths,( ) logintst , ( ) logland , and( ) logareato the equation. Now, what do you conclude about the effect of the incinerator on housing values? C 13.4 For this exercise, we use JTRAIN.dta to determine the effect of the job training grant on hours of job training per employee. The basic model for the three years is ( )0 1 2 1 2 , 1 388 89 logit it it it i t it i ithrsemp d d grant grant employ a u | o o | | |= + + + + + + +a.Estimate the equation using first differencing. How many firms are used in the estimation? How many total observations would be used if each firm had data on all variables for all three time period? b.Interpret the coefficient on grant and comment on its significance. c.Is it surprising that 1grant is insignificant? Explain. d.Do larger firms train their employees more or less, on average? How big are the differences in training due to firm size? Chapter 15 1.Consider a simple model to estimate the effect of personal computer (PC) ownership on college grade point average for graduating seniors at a large public university: 0 1GPA PC u | | = + +where PC is a binary variable indicating PC ownership. a.Why might PC ownership be correlated with u? b.Explain why PC is likely to be related to parents annual income. Does this mean parental income is a good IV for PC? Why or why not? c.Suppose that, four years ago, the university gave grants to buy computers to roughly one-half of the incoming students, and the students who received grants were randomly chosen. Carefully explain how you would use this information to construct an instrumental variable for PC. 2.Suppose that you wish to estimate the effect of class attendance on student performance, as in Example 6.3. A basic model is 0 1 2 3stndfnl atndrte priGPA ACT u | | | | = + + + +31a.Let dist be the distance from the students living quarters to the lecture hall.Assuming that dist and u are uncorrelated, what other assumption must dist satisfy in order to be a valid IV for atndrte? b.Suppose, as in equation (6.18), we add the interaction term priGPAatndrte . What might be a good IV forpriGPAatndrte ? [Hint: if ( ), , 0 EupriGPAACTdist= , as happens whenpriGPA, ACT, and dist are all exogenous, then any function of priGPA and dist is uncorrelated with u.] 3.Consider the simple regression model 0 1y x u | | = + +and let z be a binary instrumental variable for x. Use (15.10) to show that the IV estimator 1|can be written as( ) ( )1 1 0 1 0y y x x | = where 0yand 0xare the sample average of iyand ixover the part of the sample with0iz = , and where 1yand 1xare the sample average of iyand ixover the part of the sample with1iz = . This estimator, known as a grouping estimator, was first suggested by Wald (1940). 4.Refer to equations (5.19) and (15.20). Assume that u xo o = , so that the population variation in the error term is the same as it is inx .Suppose that the instrumental variable,z , is slightly correlated withu :( ) , 0.1 Corr z u = . Suppose thatzandxhave a somewhat stronger correlation:( ) , 0.2 Corr zx = . a.What is the asymptotic bias in the IV estimator? b.How much correlation would have to exist betweenuandxbefore OLS has more asymptotic bias than 2SLS? 5.The following is a simple model to measure the effect of a school choice program on standardized test performance (see Rouse[1998] for motivation): 0 1 2 1score choice faminc u | | | = + + +where score is the score on a statewide test, choice is a binary variable indicating whether a student attended a choice school in the last year, and faminc is family income. The IV for choice is grant, the dollar amount granted to students to use for tuition at choice schools. The grant amount differed by family income level, which is why we control for faminc in the equation.a.Even with faminc in the equation, why might choice be correlated with 1u ? b.If withing each income class, the grant amounts were assigned randomly, is grant uncorrelated with 1u ? c.Write the reduced form equation for choice. What is needed for grant to be partially correlated with choice? 6.Suppose that, in equation (15.8), you do not have a good instrumental variable candidate for skipped. But you have two other pieces of information on students: 32combined SAT score and cumulative GPA prior to the semester. What would you do instead of IV estimation? Computer Exercises C15.1 Use the data in WAGE2.dta for this exercise. a.In Example 15.2, using sibs as an instrument for educ, the IV estimate of the return to education is 0.122. To convince yourself that using sibs as an IV for educ is not the same as just plugging sibs in for educ and running an OLS regression, run the regression of( ) logwageon sibs and explain your findings. b.The variable brthord is birth order (it is one for a first-born child, two for a second-born child, and so on). Explain why educ and brthord might be negatively correlated. Regress educ on brthord to determine whether there is a statistically significant negative correlation.c.Use brthord as an IV for educ in equation (15.1). Report and interpret the results. d.Now, suppose that we include number of siblings as an explanatory variable in the wage equation; this controls for family background, to some extent: ( )0 1 2logwage educ sibs u | | | = + + +Suppose that we want to use brthord as an IV for educ, assuming that sibs is exogenous. The reduced form for educ is 0 1 2educ sibs brthord v t t t = + + +State and test the identification assumption. e.Estimate the wage equation in d) using brthord as an IV for educ (and sibs as its own IV). Comment on the standard errors for educ|and sibs| . f.Using the fitted values from e) educ , compute the correlation between educand sibs. Use this result to explain your findings from e). C15.2 Use the data in CARD.dta for this exercise. a.The equation we estimated in Example 15.4 can be written as( )0 1 2log ... wage educ exper u | | | = + + + +where the other explanatory variables are listed in Table 15.1.In order for IV to be consistent, the IV for educ, nearc4, must be uncorrelated with u. Could nearc4 be correlated with things in the error term, such as unobserved ability? Explain. b.For a subsample of the mean in the data set, an IQ score is available. Regress IQ on nearc4 to check whether average IQ scores vary by whether the man grew up near a four-year college. What do you conclude? c.Now, regress IQ on nearc4, smsa66, and the 1966 regional dummy variables reg662,,reg669.Are IQ and nearc4 related after the geographic dummy variables have been partialled out? d.From b) and c), what do you conclude about the importance of controlling for smsa66 and the 1966 regional dummies in the( ) logwageequation? 33C15.3 The purpose of this exercise is to compare the estimates and standard errors obtained by correctly using 2SLS with those obtained using inappropriate procedures. Use the data file WAGE2.dta. a.Use a 2SLS routine to estimate the equation ( )0 1 2 3 4logwage educ exper tenure black u | | | | | = + + + + +where sibs is the IV for educ. Report the results in the usual form. b.Now, manually carry out 2SLS. That is, first regress educ on sibs, exper, tenure and black and obtain the fitted valueeduc . Then run the second stage regression ( ) logwageon educ , exper, tenure and black. Verity that the |are identical to those obtained from a), but that the standard errors are somewhat different. The standard errors obtained from the second stage regression when manually carrying out 2SLS are generally inappropriate. c.Now, use the following two-step procedure, which generally yields inconsistent parameter estimates of| , and not just inconsistent standard errors. In step one, regress educ on sibs only and obtain the fitted value educ(Note that this is an incorrect first stage regression.) Then in the second step, run the regression of ( ) logwageon educ , exper, tenure and black. Compare the estimate of the return to education from this incorrect procedure with that from the proper procedure of a).