Upload
dinhkhue
View
218
Download
0
Embed Size (px)
Citation preview
Economics 470: Economic Fluctuations and Forecasts Take Home Midterm
Directions This test will be posted at noon on Thursday, February 22nd. Your answers will be due at 12:05 PM on Tuesday, February 27th in our usual classroom. You may submit the exams before that in my office. Late submissions will not be accepted. You are welcome to use any print or internet resources available. You are not to discuss this exam with others (and that means all other sentient beings). If you are in a position where you are unsure about breaking these simple rules, think of the worst case scenario—if you break these rules I will give you a zero on this exam. I will be happy to help you with clarifying questions but will not answer questions that elucidate the material. Please address questions via e-mail ([email protected]); I will post answers to clarifying questions at http://faculty.wwu.edu/kriegj/Econ470/470%20Page.html and if they are necessary to completing the exam, will e-mail the entire class at your official WWU e-mail address. Your answers may be typed or legibly handwritten. Feel free to submit Stata output if it helps explain your answer (it is easiest to “copy and paste” the Stata output (as a picture) into a Word file that contains your answers). Please answer on blank, white copy paper and only include your student id number as an identifier. Do not include your name on the exam. Please start a new page for each problem; though you may combine parts of a problem onto a single page (so, a complete exam will have at least 4 pages turned in since there are 4 problems). Points possible are in parenthesis after each question. If you are looking for a diversion from taking this exam, remember Adam Wright and I are presenting our work on the impacts of marijuana legalization in Parks Hall 441 at 4pm on Friday. 1. You want to estimate the following very simple time series regression model:
Yt = β0 + β1t + εt where εt is independently and identically distributed error term with mean 0 and variance σ2. The independent variable t is simply a time series count variable that is equal to 1 for the first observation, 2 for the second, 3 for the third, … through the final observation of T for the Tth (Teeth???). a. Is the OLS estimator of β1 unbiased and efficient? (6) For OLS to unbiased and efficient, we need the classical assumptions to be true: 1. E[ε] = 0 2. Var[ε] = σ2 3. Cov[ε,X] = 0 4. Cov[εi, εj] = 0 for all i ≠ j 5. Linearity Since none of these assumptions are violated, OLS will produce unbiased and efficient estimates of β1.
b. A friend who has not taken Econ. 470 asks why you are bothering to estimate this model with OLS. She points out that one possible method of estimating β1 is to compute the following:
β̇ =∑ (yt − yt−1)T
t=2
T − 1
Is your friend right? In other words, is β̇ an unbiased estimator of β1? (6)
E[β̇] = E [∑ (yt − yt−1)T
t=2
T − 1]
= E [∑ (𝛽0+𝛽1𝑡+𝜀𝑡−𝛽0−𝛽1(𝑡−1)−𝜀𝑡−1)T
t=2
T−1]
= E [∑ (𝛽1(𝑡−𝑡+1)+𝜀𝑡−𝜀𝑡−1)T
t=2
T−1]
= E [∑ (𝛽1 + 𝜀𝑡 − 𝜀𝑡−1)T
t=2
T − 1]
= E [(𝑇 − 1)𝛽1 + ∑ (𝜀𝑡 − 𝜀𝑡−1)T
t=2
T − 1]
= E [(𝑇−1)𝛽1
T−1] = β1
So yes, β̇ is an unbiased estimator of β1.
c. (BONUS—Don’t work on this until you get the rest done). Is β̇ more or less efficient than the OLS estimator of β1? (4)
We know that the Var[𝛽1̂] = �̂�2
∑ (𝑡−𝑡̅)2𝑇𝑡=1
. In this case �̂�2 =∑ 𝜀𝑖
2𝑇𝑡=1
𝑇−2. Recalling that the sum of the first T
consecutive integers is T(T+1)/2 and the sum of the first T consecutive integers squared is T(T+1)(2T+1)/6 means we can re-write the denominator as (1/12)(T2-1) or the entire variance as:
Var[𝛽1̂] = 12𝜎2
𝑇2−1
The Var[β̇] = 𝐸 [(�̇� − 𝐸[�̇�])2
] = 𝐸 [(∑ (𝛽1+𝜀𝑡−𝜀𝑡−1−𝛽1)T
t=2
T−1)
2
]
= 𝐸 [(∑ (𝜀𝑡−𝜀𝑡−1)T
t=2
T−1)
2
]= 𝐸 [(ε2−ε1+ε3−ε2+ε4−ε3+⋯ε𝑇−ε𝑇−1
T−1)
2]= 𝐸 [(
ε𝑇−ε1
T−1)
2]
= 2𝜎2
(𝑇−1)2
Since the sum of the squared error term will be the same in the OLS case versus your friends case, we
can show that Var[β̇] > Var[𝛽1̂] when 2𝜎2
(𝑇−1)2 >12𝜎2
𝑇2−1 or when T2-1 > 6(T-1)2. For integers greater than
one, T2 – 1 is always greater than 6(T-1)2 so Var[β̇] > Var[𝛽1̂] meaning that OLS is more efficient than your friend’s approach.
It turns out that there is an easier way of demonstrating Var[β̇] > Var[𝛽1̂]. Consider for a moment the
equation for β̇:
β̇ =∑ (yt − yt−1)T
t=2
T − 1=
𝑦2 − 𝑦1 + 𝑦3 − 𝑦2 + ⋯ + 𝑦𝑇 − 𝑦𝑡−1
T − 1=
𝑦𝑇 − 𝑦1
𝑇 − 1
If you think about it, the equation 𝑦𝑇−𝑦1
𝑇−1 is simply the change in Y from the first and last observation of Y,
divided by one less the total number of observations. However, since T is the x-axis in this case, this equation is also the rise divided by the run—the slope that exists between the first and last observations
in our data set. Given this, it is easy to see why Var[β̇] > Var[𝛽1̂]: β̇ is determined by the placement of
only two points while 𝛽1̂ is determined by the placement of all T observations (or T – 1 if we include an intercept in the OLS regression).
2. Consider the model: Yt = β + 1 Yt-1 + 2 Yt-2 + 3 Yt-3 + t where t ~ N(0, 2). a. Under what condition(s) is Yt a stationary series? (4) A necessary condition is that 1 – Σφ ≠ 0 (this will be obvious in part b). b. When Yt is a stationary series, what is its mean? (4) Taking expectations of both sides and making use of the stationary assumption:
E[Yt]= β + 1 E[Yt] + 2 E[Yt] + 3 E[Yt] so
E[Yt] = 𝛽
1−1−2−3
c. When Yt is stationary, what is ρ(1) and ρ(2) for this series? (4)
so
ρ(1) = 1+2 3
1−2−13−32
and
ρ(2) = 1
2+2−22+1 3
1−2−13−32
d. Depending on your answers to the above three questions, discuss how you would forecast future values of Yt. Explain why your process produces unbiased forecasts. (4) Estimate the phi’s using either OLS or ARIMA (they are identical processes) and then apply theses phis to the correct lag of Y. This will be unbiased since OLS produces unbiased estimates of the coefficients AND the future values of the error term are, on average, zero. 3. I have posted 300 quarterly observations of a data set which I created on your website entitled “470 Midterm Monte Carlo Problem.” Please forecast 2319q4 and 2320q1. Provide 95% confidence intervals for your forecasts. I will give extra credit to confidence intervals that are constructed by hand (i.e., not using Stata’s stdf or stdp routines). I will give additional credit to answers that describe the process you used to arrive at your forecasts. (12) I constructed this process as Y = -62.5 + .05t + 25*q + Xt where Xt = .8Xt-1 + et and et is normally distributed with mean 0 and variance 100. The addition of the Xt makes this an AR(1) process with a trend (of .05 units per period) and seasonality (that changes 25 units over each quarter). After creating a trend (gen t = _n) and a quarter indicator (xi i.q or gen q = 1 if quarter ==1, etc.), I find:
A correlogram of these residuals gives:
confidence interval is truncated at zero.
Note: The test of the variance against zero is one sided, and the two-sided
/sigma 10.50395 .4601936 22.83 0.000 9.601992 11.40592
L1. .7924468 .0345344 22.95 0.000 .7247606 .8601331
ar
ARMA
_cons -24.13648 6.080061 -3.97 0.000 -36.05318 -12.21978
q 25.26229 .3749582 67.37 0.000 24.52739 25.99719
t .0983341 .0326942 3.01 0.003 .0342546 .1624135
y
y Coef. Std. Err. z P>|z| [95% Conf. Interval]
OPG
Log likelihood = -1131.699 Prob > chi2 = 0.0000
Wald chi2(3) = 4931.76
Sample: 2244q4 - 2319q3 Number of obs = 300
ARIMA regression
Iteration 6: log likelihood = -1131.6992
Iteration 5: log likelihood = -1131.6993
(switching optimization to BFGS)
Iteration 4: log likelihood = -1131.6994
Iteration 3: log likelihood = -1131.7
Iteration 2: log likelihood = -1131.7022
Iteration 1: log likelihood = -1131.7119
Iteration 0: log likelihood = -1131.8087
(setting optimization to BHHH)
I would forecast this by first removing the trend and seasonality and then forecasting what is left over:
23 0.0060 0.0273 11.233 0.9807
22 -0.0243 -0.0099 11.222 0.9714
21 -0.0517 -0.0843 11.03 0.9622
20 0.0066 0.0064 10.163 0.9651
19 -0.0083 -0.0090 10.149 0.9492
18 0.0541 0.0639 10.126 0.9277
17 -0.0935 -0.1078 9.1872 0.9342
16 0.0177 0.0194 6.388 0.9833
15 0.0490 0.0583 6.2877 0.9745
14 -0.0581 -0.0669 5.5237 0.9771
13 0.0217 0.0344 4.4549 0.9853
12 -0.0266 -0.0300 4.3059 0.9773
11 0.0487 0.0435 4.0838 0.9674
10 0.0203 0.0224 3.3407 0.9722
9 0.0498 0.0548 3.2125 0.9553
8 0.0116 0.0119 2.4404 0.9645
7 -0.0654 -0.0702 2.3985 0.9345
6 0.0234 0.0232 1.0743 0.9826
5 0.0081 0.0107 .90507 0.9699
4 -0.0465 -0.0462 .88468 0.9267
3 -0.0154 -0.0157 .2239 0.9736
2 0.0046 0.0040 .15153 0.9270
1 0.0219 0.0219 .14509 0.7033
LAG AC PAC Q Prob>Q [Autocorrelation] [Partial Autocor]
-1 0 1 -1 0 1
. corrgram resid
. predict resid, resid
confidence interval is truncated at zero.
My forecast for periods 301 and 302: Y301 = -24.033+.0976×301+25.09×4-.0044+.7922×18.60 = 120.44 Y302 = 24.033 + .0976×302+25.09×1-.0044+.7922×14.72 = 42.18 Where 18.6 is leftover is period 300 and 14.72 is the predicted value of leftover in period 301. The standard errors of this are easy to write down, but difficult to compute (because we are in a world of multiple independent variables). The forecast error is:
𝜀𝑡+1𝑓
= 𝑌𝑡+1 − �̂�𝑡+1 = 𝛽0 + 𝛽1 (𝑡 + 1) + 𝛽2 𝑞 + 𝛽3 𝑌𝑡 + 𝜀𝑡+1 − �̂�0 + �̂�1 (𝑡 + 1) + �̂�2 𝑞 + �̂�3 𝑌𝑡 So:
𝜀𝑡+1𝑓
= 𝛽0 − �̂�0 + (𝛽1 − �̂�1) (𝑡 + 1) + (𝛽2 − �̂�2) 𝑞 + (𝛽3 − �̂�3) 𝑌𝑡 + 𝜀𝑡+1 In some ways, this is a nicer setup than what you usually see. Because we actually know all values of the variables on the right hand side of the equation (t, q, and Yt) we won’t have any forecast error arising
_cons -.0044739 .6098538 -0.01 0.994 -1.204656 1.195708
lleft .7926116 .035427 22.37 0.000 .7228918 .8623314
leftover Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 88690.5895 298 297.619428 Root MSE = 10.545
Adj R-squared = 0.6264
Residual 33027.3289 297 111.203128 R-squared = 0.6276
Model 55663.2605 1 55663.2605 Prob > F = 0.0000
F(1, 297) = 500.55
Source SS df MS Number of obs = 299
. reg leftover lleft
(2 missing values generated)
. gen lleft = leftover[_n-1]
(2 missing values generated)
. predict leftover, resid
_cons -24.03389 3.00416 -8.00 0.000 -29.94603 -18.12175
q 25.09002 .8936776 28.08 0.000 23.33128 26.84876
t .0976848 .0115374 8.47 0.000 .0749794 .1203902
y Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 346118.408 299 1157.58665 Root MSE = 17.306
Adj R-squared = 0.7413
Residual 88950.118 297 299.495347 R-squared = 0.7430
Model 257168.29 2 128584.145 Prob > F = 0.0000
F(2, 297) = 429.34
Source SS df MS Number of obs = 300
. reg y t q
from having to estimate these variables. If OLS and/or the ARIMA process produce unbiased estimates of the betas, and if the error term is mean zero (all of which are reasonable estimates), the average
forecast error term is zero, or 𝐸[𝜀𝑡+1𝑓
] = 0. The forecast variance is then:
𝜎𝑓2 = 𝐸 [(𝜀𝑡+1
𝑓)
2] − (𝐸[𝜀𝑡+1
𝑓])
2= 𝐸 [(𝜀𝑡+1
𝑓)
2]
𝜎𝑓2 = 𝐸 [(𝛽0 − �̂�0 + (𝛽1 − �̂�1) (𝑡 + 1) + (𝛽2 − �̂�2) 𝑞 + (𝛽3 − �̂�3) 𝑌𝑡 + 𝜀𝑡+1)
2]
With a little work, it is clear that the forecast variance is equal to:
𝜎𝑓2 = 𝑉𝑎𝑟(�̂�0) + 𝑉𝑎𝑟(�̂�1)(𝑡 + 1)2 + 𝑉𝑎𝑟(�̂�2)𝑞2 + 𝑉𝑎𝑟(�̂�3)𝑌𝑡
2 + 𝜎2
+ 𝑎 𝑏𝑢𝑛𝑐ℎ 𝑜𝑓 𝑐𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒𝑠 𝑜𝑓 𝑎𝑙𝑙 𝑜𝑓 𝑡ℎ𝑒𝑠𝑒 𝑡𝑒𝑟𝑚𝑠 Solving this involves knowing the form of the variances and covariances or some considerable linear algebra. However, if we are willing to assume that our estimated coefficients equal the true coefficients (perhaps a heroic assumption), then all of the variances and covariance turn out to be zero and we are left with σ2 which, in our case, was estimated at 10.5032 = 110.313—not far from the 100 I used to create this data set. So, our confidence interval for period 301 is 120.44 ± 1.96 × 10.503 = {99.86, 141.02} Forecasting period 302 is slightly harder in that we have to forecast period 301 in order to forecast period 302. This means that any forecast error we make in period 301 is compounded into our forecast error of period 302. This is easy to see in a simply AR(1) forecast two periods out: Yt+2 = φYt+1 + εt+2 = φ(φ Yt + εt+1) + εt+2
Since we don’t know εt+2 nor εt+1 at time t when we forecast Yt+2, our forecast is �̂�𝑡+2 = φ2𝑌𝑡
So our forecast error is 𝜀𝑡+2𝑓
= φε𝑡+1 + 𝜀𝑡+1 and the forecast variance is (1+ φ2) σ2.
In our case, the forecast variance would be (1 + .7922)×110.503 so our confidence interval for period 302 is 42.18 ± 1.96 × 13.40 = {28.78, 55.58} 4. I have posted data from the St. Louis Federal Reserve Bank on the velocity of M2. You may find the original data set (and a short description of velocity) at: https://fred.stlouisfed.org/series/M2V. This is quarterly data beginning in 1959q1 and continuing through 2017q4. Please forecast 2018q1 and 2018q2. Provide 95% confidence intervals for your forecasts. I will give extra credit to confidence that are constructed by hand (i.e., not using Stata’s stdf or stdp routines). I will give additional credit to answers that describe the process you used to arrive at your forecasts. (12) My initial correlogram screams AR(2):
I estimate an AR(2):
One thing about this bothers me some is that the coefficients on the two AR terms sum to almost one. This also screams something to me: unit root (see answers to 2A of this exam). When I look at the residuals from this model, I find something that looks like white noise:
13 0.6559 0.1579 2183.5 0.0000
12 0.6831 -0.0371 2075.2 0.0000
11 0.7118 -0.1350 1958.2 0.0000
10 0.7393 -0.0452 1831.7 0.0000
9 0.7659 0.0699 1695.9 0.0000
8 0.7934 -0.0053 1550.7 0.0000
7 0.8227 -0.0133 1395.7 0.0000
6 0.8518 0.0161 1229.6 0.0000
5 0.8807 -0.0856 1052.4 0.0000
4 0.9088 -0.0343 863.86 0.0000
3 0.9357 -0.0457 663.91 0.0000
2 0.9611 -0.4366 452.86 0.0000
1 0.9834 1.0035 231.14 0.0000
LAG AC PAC Q Prob>Q [Autocorrelation] [Partial Autocor]
-1 0 1 -1 0 1
. corrgram VM2
confidence interval is truncated at zero.
Note: The test of the variance against zero is one sided, and the two-sided
/sigma .0183191 .000587 31.21 0.000 .0171686 .0194696
L2. -.4388779 .0495756 -8.85 0.000 -.5360443 -.3417115
L1. 1.431532 .0491312 29.14 0.000 1.335236 1.527827
ar
ARMA
_cons 1.733572 .1865255 9.29 0.000 1.367989 2.099155
VM2
VM2 Coef. Std. Err. z P>|z| [95% Conf. Interval]
OPG
Log likelihood = 606.5825 Prob > chi2 = 0.0000
Wald chi2(2) = 18490.81
Sample: 1959q1 - 2017q4 Number of obs = 236
ARIMA regression
Iteration 8: log likelihood = 606.58255
Iteration 7: log likelihood = 606.58254
Iteration 6: log likelihood = 606.58251
Iteration 5: log likelihood = 606.58232
(switching optimization to BFGS)
Iteration 4: log likelihood = 606.5392
Iteration 3: log likelihood = 606.52021
Iteration 2: log likelihood = 606.40951
Iteration 1: log likelihood = 606.23791
Iteration 0: log likelihood = 583.5577
(setting optimization to BHHH)
. arima VM2, ar(1/2)
At this point, I’m close to being done. We might add some quarterly dummies (to check for seasonality) and a time trend. When I do this, nothing changes:
So, I return to the original model to calculate forecast variances.
15 -0.0203 -0.0474 18.387 0.2429
14 0.0687 0.0621 18.282 0.1942
13 0.0181 0.0102 17.086 0.1954
12 -0.1437 -0.1387 17.004 0.1494
11 0.1421 0.1447 11.828 0.3767
10 0.0986 0.0972 6.7882 0.7453
9 -0.0291 -0.0352 4.3699 0.8854
8 -0.0529 -0.0662 4.1604 0.8424
7 0.0305 0.0294 3.4713 0.8383
6 0.0198 0.0142 3.2425 0.7778
5 0.0030 0.0047 3.1466 0.6774
4 0.1110 0.1109 3.1445 0.5339
3 0.0200 0.0204 .16378 0.9832
2 0.0147 0.0146 .06706 0.9670
1 -0.0079 -0.0079 .01492 0.9028
LAG AC PAC Q Prob>Q [Autocorrelation] [Partial Autocor]
-1 0 1 -1 0 1
. corrgram resid
. predict resid, resid
.
confidence interval is truncated at zero.
Note: The test of the variance against zero is one sided, and the two-sided
/sigma .0182422 .0006322 28.85 0.000 .0170031 .0194813
L2. -.435679 .0523995 -8.31 0.000 -.5383802 -.3329779
L1. 1.428297 .0519387 27.50 0.000 1.326499 1.530096
ar
ARMA
_cons 1.849891 .3939403 4.70 0.000 1.077783 2.622
_Iq_4 -.000392 .0018215 -0.22 0.830 -.0039622 .0031781
_Iq_3 .001086 .0021982 0.49 0.621 -.0032223 .0053943
_Iq_2 .0018009 .0018879 0.95 0.340 -.0018994 .0055012
time -.0009805 .0022242 -0.44 0.659 -.0053399 .003379
VM2
VM2 Coef. Std. Err. z P>|z| [95% Conf. Interval]
OPG
Log likelihood = 607.578 Prob > chi2 = 0.0000
Wald chi2(6) = 18770.18
Sample: 1959q1 - 2017q4 Number of obs = 236
ARIMA regression
Iteration 12: log likelihood = 607.57798
Iteration 11: log likelihood = 607.57798
Iteration 10: log likelihood = 607.57796
Iteration 9: log likelihood = 607.57757
Iteration 8: log likelihood = 607.5756
Iteration 7: log likelihood = 607.57327
Iteration 6: log likelihood = 607.47393
Iteration 5: log likelihood = 607.31735
(switching optimization to BFGS)
Iteration 4: log likelihood = 607.05396
Iteration 3: log likelihood = 605.69331
Iteration 2: log likelihood = 605.49486
Iteration 1: log likelihood = 605.3069
Iteration 0: log likelihood = 579.97125
(setting optimization to BHHH)
. arima VM2 time _Iq_2-_Iq_4, ar(1/2)
In an AR(2) model, my forecast of Yt+1 is φ1 Yt + φ2 Yt-1 while the actual Yt+1 is φ1 Yt + φ2 Yt-1 + εt+1. Assuming my estimated φs are equal to the actual φs, my forecast error is εt+1 and my forecast variance is σ2. Using my preferred model, that is simply .0182 = .0003. My forecast of period Yt+2 is a little more complicated for the same reason discussed in the answer to problem #3 of this exam. My forecast of Yt+2 = φ1 Yt+1 + φ2 Yt but notice, I don’t observe Yt+1 at the time I am forecasting two periods into the future. So, I actually have to forecast Yt+2 using φ1 (φ1 Yt + φ2 Yt-1) + φ2 Yt = (φ1
2 + φ2) Yt + φ1 φ2Y Yt-1 The actual value of Yt+2 = = (φ1
2 + φ2) Yt + φ1 φ2Y Yt-1 + φ1 εt+1 + εt+2. The difference between my forecast and the actual value, the forecast error, is φ1
2 εt+1 + εt+2 so the forecast variance must be (1+ φ12) σ2.
Using my preferred model, this is (1 + 1.422)×.0182 = .0008. In this problem, we’re asked to forecast the next two time periods. The last two in the data set are:
One thing I need to remember is that the ARIMA, AR process” does not report to me the intercept, but instead the sample mean of the data. Thus, the “constant” reported above of 1.73 isn’t the intercept, but instead the average of VM2 over the dataset. To be clear, the mean of a stationary AR(2) process is given by:
E[Y] =B0 + φ1 E[Y] + φ2 E[Y] = 𝛽0
1−1−1
.
In our case, Stata tells us that E[Y] = 1.73 and since it also reveals φ1 and φ2 we can figure out that β0 = .012 So, my forecast for the next two periods are: 2018q1 : .005 + 1.431 × 1.431 - .438 ×1.427 = 1.428 2018q2 : .005 + 1.431 × 1.428 - .438 × 1.431 = 1.422 My confidence intervals are: 2018q1 : 1.428 ± 1.96 × .0003 = {1.427,1.429} 2018q2 : 1.422 ± 1.96 × .0008 = {1.420,1.424}
.
236. 2017q4 1.431
235. 2017q3 1.427
t VM2