8/3/2019 Metrics Guide 1
http://slidepdf.com/reader/full/metrics-guide-1 1/29
Study Guide for Econometrics
Unit 0: Preliminaries
Data types
Cross-sectional data
Time-series data
Panel data (or longitudinal data)
8/3/2019 Metrics Guide 1
http://slidepdf.com/reader/full/metrics-guide-1 3/29
Study Guide for Econometrics, page 3of29.
Total sum of squares: ( yi∑ − y)2 = (N − 1) ⋅var( y)
Model sum of squares: ( ˆ yi∑ − y)2 = (N −1) ⋅var( ˆ y)
Residual sum of squares: ( ˆ yi∑ − y
i)2 = (N − 1) ⋅var(e)
Coefficient of determination: R2 = MSS TSS = 1−RSS TSS = corr( y , ˆ y)2
Interpretation
Quality of model (useful, but incorrect)
Comparison between models
Hypothesis testing
Univariate: H 0: β
j =β j
*;H
A: β
j≠ β
j
*
Test statistic t*= (β j − β j
*)/st.err.(β j ) has t-distribution with N − k d.o.f.
Multivariate
Null hypothesis that a set of β s take on particular values;Alternative that at least one of them does not.
Test statistic has F-distribution.
8/3/2019 Metrics Guide 1
http://slidepdf.com/reader/full/metrics-guide-1 4/29
Study Guide for Econometrics, page 4of29.
Example of Stata commands:
reg y x1 x2 x3 OLS regression test x1 = 2.13 Test of individual hypothesis test x2 = -5.9, accum Joint test of hypotheses test x2 x3 Joint test of equaling zero
. reg y x1 x2 x3
Source | SS df MS Number of obs = 100 -------------+------------------------------ F( 3, 96) = 1476.46
Model | 5241.10184 3 1747.03395 Prob > F = 0.0000 Residual | 113.592501 96 1.18325521 R-squared = 0.9788
-------------+------------------------------ Adj R-squared = 0.9781 Total | 5354.69434 99 54.0878216 Root MSE = 1.0878
------------------------------------------------------------------------------y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------x1 | 1.983209 .109285 18.15 0.000 1.76628 2.200137x2 | -7.048816 .1111005 -63.45 0.000 -7.269349 -6.828283x3 | .0388324 .107991 0.36 0.720 -.1755282 .2531929
_cons | 3.109514 .1091322 28.49 0.000 2.892888 3.32614------------------------------------------------------------------------------
. test x1 = 2.13
( 1) x1 = 2.13
F( 1, 96) = 1.80Prob > F = 0.1824
. test x2 = -5.9, accum
( 1) x1 = 2.13( 2) x2 = -5.9
F( 2, 96) = 55.42Prob > F = 0.0000
. test x2 x3
( 1) x2 = 0( 2) x3 = 0
F( 2, 96) = 2076.70Prob > F = 0.0000
Top-left table: “SS” column contains the MSS, RSS, and TSS. Disregard “df” and “MS”.
Top-right table: number of observations; F-statistic for the “overall significance” of the regression
(testing the hypothesis that all of the explanatory variables have zero effect); p-value of thishypothesis; R
2
of the regression. Disregard “adjusted R2
and “Root MSE”.
Bottom table: “Coef.” column contains estimates of β ; the next column has standard errors of
each β ; then the t-statistic testing the hypothesis that this variable has zero effect; then the p-value of this test; finally, a 95% confidence interval for the estimated coefficient.
8/3/2019 Metrics Guide 1
http://slidepdf.com/reader/full/metrics-guide-1 5/29
Study Guide for Econometrics, page 5of29.
Unit 2: Data Concerns
Collinearity
Perfect collinearity: one explanatory variable is a linear function of others.
Implication: ˆβ cannot be estimated.
Solution: Drop one variable; modify interpretation.
Near collinearity: high correlation between explanatory variables.
Implication: ˆβ has large standard errors.
Solutions: Dropping variables (discouraged); Change nothing, butfocus on joint significance (preferred).
Specification
Rescaling variables: no theoretical difference (some practical concerns) {6.1}
Omitted variables: Omittingx3 from the model causes
E[β 2] = β
2+
cov(x2,x
3)
var(x2)
β 3
(“omitted variable bias”)
Irrelevant variables: Including irrelevant x3
introduces no bias in
estimation of β , and E[β 3] = β
3= 0 .
Qualitative variables
Dummy variables: values of 0 or 1, depending on whether acondition is met.
Categorical variables: convert to a series of dummy variables; omitthe “reference” category.
Nonlinear models
Common nonlinear specifications
Quadratrics (for changing marginal effects)
yi = β
1+ β
2x
i + β 3x
i
2+ e
i; Δ y Δx = β
2+ β
3x
i.
Logarithms (for percentage changes and elasticities)
yi =
β 1+ β
2ln(x
i)+ e
i; β
2 Δ y %Δx .
ln( yi ) = β 1 + β 2xi +ei ; β 2 %Δ y Δx .
ln( yi) = β
1+ β
2ln(x
i)+ e
i; β
2%Δ y %Δx .
Interactions (for complementarities)
yi =
β 1+ β
2x
2 i +β
3x
3 i +β
4(x
2 i⋅ x
3 i) ;
Δ y Δx2= β
2+ β
4x3i
and Δ y Δx3= β
3+ β
4x2i
.
8/3/2019 Metrics Guide 1
http://slidepdf.com/reader/full/metrics-guide-1 6/29
Study Guide for Econometrics, page 6of29.
Interactions with dummy variables
Choosing a specification
Economic theory (preferred)
Eyeballing data
Comparison of R2 values (dangerous)
Testing a specification
Simple: inclusion of higher order terms
Ramsey’s Econometric Specification Error Test (RESET)
Dangers of “data mining” (and specification mining)
Classical measurement error
True model: yi= β
1+ β
2x
i+ e
i , but x
i= x
i+m
imeasured instead.
“Classical”:E
[m
i everything]=
0 .
Implication: E[β OLS
] = β var(x)
var(x)+ var(m)(“attenuation bias”; “bias toward zero”)
Special case: tests of H 0: β
2= 0 unaffected.
Unusual observations (“outliers”)
Implication: OLS is highly sensitive to extreme values of yi.
Solutions:
Dropping outliers (dangerous)
Least absolute deviations estimator: ˆβ LAD
to min∑ yi− x
iβ .
No adjustments (recommended)
Interpretation of OLS results
Experimental data: researcher manipulates x values.
Correlation can be interpreted as causal effect.
Empirical data: generated through real-world processes
Factors contributing to observed correlation, aside from effect of x on y:
Unobserved heterogeneityReverse causalitySelection
8/3/2019 Metrics Guide 1
http://slidepdf.com/reader/full/metrics-guide-1 8/29
Study Guide for Econometrics, page 8of29.
Ramsey Econometric Specification Test
. reg y x1 x2 x3
Source | SS df MS Number of obs = 2134-------------+------------------------------ F( 3, 2130) = 437.98
Model | 35040.6408 3 11680.2136 Prob > F = 0.0000Residual | 56803.8176 2130 26.668459 R-squared = 0.3815
-------------+------------------------------ Adj R-squared = 0.3807Total | 91844.4584 2133 43.0588178 Root MSE = 5.1642
------------------------------------------------------------------------------y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------x1 | -1.185214 .1127196 -10.51 0.000 -1.406266 -.9641618 x2 | 2.081589 .1122702 18.54 0.000 1.861418 2.301759 x3 | -3.18763 .1100042 -28.98 0.000 -3.403357 -2.971904
_cons | -.5286567 .111807 -4.73 0.000 -.7479189 -.3093945 ------------------------------------------------------------------------------
. predict yhat(option xb assumed; fitted values)
. gen yhat2 = yhat^2
. gen yhat3 = yhat^3
. gen yhat4 = yhat^4
. gen yhat5 = yhat^5
. reg y x1 x2 x3 yhat2 yhat3 yhat4 yhat5
Source | SS df MS Number of obs = 2134-------------+------------------------------ F( 7, 2126) = 191.66
Model | 35535.1167 7 5076.44524 Prob > F = 0.0000Residual | 56309.3417 2126 26.4860497 R-squared = 0.3869
-------------+------------------------------ Adj R-squared = 0.3849Total | 91844.4584 2133 43.0588178 Root MSE = 5.1465
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+----------------------------------------------------------------x1 | -1.281567 .128619 -9.96 0.000 -1.533799 -1.029335 x2 | 2.237487 .1578664 14.17 0.000 1.927898 2.547076 x3 | -3.419771 .1976777 -17.30 0.000 -3.807433 -3.032109
yhat2 | -.007986 .0107457 -0.74 0.457 -.0290591 .0130871 yhat3 | -.0030688 .0018225 -1.68 0.092 -.0066428 .0005051 yhat4 | -.0001027 .0001054 -0.97 0.330 -.0003094 .000104 yhat5 | .0000144 .0000111 1.29 0.197 -7.49e-06 .0000362 _cons | -.361512 .1569125 -2.30 0.021 -.6692299 -.053794
------------------------------------------------------------------------------
. test yhat2 yhat3 yhat4 yhat5
( 1) yhat2 = 0 ( 2) yhat3 = 0
( 3) yhat4 = 0 ( 4) yhat5 = 0
F( 4, 2126) = 4.67 Prob > F = 0.0009
Note: Despite the marginal significance of all the “yhat” terms in this example, they are jointlyhighly significant.
8/3/2019 Metrics Guide 1
http://slidepdf.com/reader/full/metrics-guide-1 9/29
Study Guide for Econometrics, page 9of29.
Unit 3: Weighted and Generalized Least Squares Regression
Heteroskedasticity: E[ei
2x
i] = σ
i
2≠ σ
2 .
OLS unbiased as long as E[e X] = 0 holds.
Variance calculation incorrect.
Robust standard errors: var(β OLS
) = ( ′X X)−1 ′X ˆ′e eX( ′X X)−1 .
OLS inefficient.
Testing for heteroskedasticity
White test
Breusch-Pagan test
Generalized Least Squares (GLS)
Objective: pick
ˆβ GLS to
min ˆ′e Ωe
, for some symmetricN × N
matrixΩ
.Estimator: β
GLS= ( ′X ΩX)−1( ′X ΩY) .
Unbiasedness: if E[e X] = 0 , then any GLS estimator is unbiased.
Most efficient: Ω = (E[ ′e e])−1 .
Special cases of GLS:
Weighted least squares (WLS): Ω is a diagonal matrix; mostefficient with heteroskedasticity (and no cross correlation).
Ordinary least squares (OLS): Ω is the identity matrix; mostefficient with homoskedasticity (and no cross-correlation).
Feasible Generalized Least Squares (FGLS)
Problem: In practice, Ω is unknown.
Solution: Use OLS to predict e ; then calculate ˆΩ ; use in place of unknown Ω .
Estimator: β FGLS
= ( ′X ΩX)−1( ′X ΩY) .
8/3/2019 Metrics Guide 1
http://slidepdf.com/reader/full/metrics-guide-1 10/29
Study Guide for Econometrics, page 10of29.
Examples of Stata commands:
reg y x1 x2 x3 OLS regression hettest Breusch-Pagan test reg y x1 x2 x3, robust OLS regression with robust st. errors reg y x1 x2 x3 [weight=omega] WLS regression
. reg y x1 x2 x3
Source | SS df MS Number of obs = 100 -------------+------------------------------ F( 3, 96) = 1227.70
Model | 5190.30582 3 1730.10194 Prob > F = 0.0000 Residual | 135.285436 96 1.40922329 R-squared = 0.9746
-------------+------------------------------ Adj R-squared = 0.9738 Total | 5325.59125 99 53.793851 Root MSE = 1.1871
------------------------------------------------------------------------------y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------x1 | 1.929843 .1192645 16.18 0.000 1.693104 2.166581
x2 | -7.02553 .1212458 -57.94 0.000 -7.266201 -6.784859x3 | .0407538 .1178524 0.35 0.730 -.1931813 .2746889 _cons | 2.985645 .1190978 25.07 0.000 2.749238 3.222053
------------------------------------------------------------------------------
. hettest
Breusch-Pagan / Cook-Weisberg test for heteroskedasticityHo: Constant varianceVariables: fitted values of y
chi2(1) = 9.38Prob > chi2 = 0.0022
. reg y x1 x2 x3, robust
Linear regression Number of obs = 100 F( 3, 96) = 812.97 Prob > F = 0.0000 R-squared = 0.9746 Root MSE = 1.1871
------------------------------------------------------------------------------| Robust
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+----------------------------------------------------------------
x1 | 1.929843 .1055009 18.29 0.000 1.720425 2.13926x2 | -7.02553 .179712 -39.09 0.000 -7.382255 -6.668804x3 | .0407538 .2123003 0.19 0.848 -.380659 .4621666
_cons | 2.985645 .1202126 24.84 0.000 2.747025 3.224266------------------------------------------------------------------------------
8/3/2019 Metrics Guide 1
http://slidepdf.com/reader/full/metrics-guide-1 11/29
Study Guide for Econometrics, page 11of29.
Weighted Least Squares for heteroskedasticity
. reg y x1 x2 x3
Source | SS df MS Number of obs = 500-------------+------------------------------ F( 3, 496) = 12.46
Model | 15297.6754 3 5099.22513 Prob > F = 0.0000
Residual | 203043.149 496 409.361188 R-squared = 0.0701-------------+------------------------------ Adj R-squared = 0.0644
Total | 218340.825 499 437.556763 Root MSE = 20.233
------------------------------------------------------------------------------y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------x1 | 3.646955 .8512642 4.28 0.000 1.974427 5.319483x2 | -3.964182 .9390897 -4.22 0.000 -5.809266 -2.119098x3 | .6393417 .8702385 0.73 0.463 -1.070467 2.34915
_cons | 3.094226 .9082631 3.41 0.001 1.309709 4.878744------------------------------------------------------------------------------
. predict ehat, resid
. gen ehat2 = ehat^2
. gen x1sq = x1^2. gen x2sq = x2^2
. gen x3sq = x3^2
. gen x1x2 = x1*x2
. gen x1x3 = x1*x3
. gen x2x3 = x2*x3
. quietly reg ehat2 x1 x2 x3 x1sq x2sq x3sq x1x2 x1x3 x2x3
. predict ehat2hat(option xb assumed; fitted values)
. gen omega = 1/(ehat2hat)^.5(204 missing values generated)
. reg y x1 x2 x3 [weight = omega](analytic weights assumed)
(sum of wgt is 2.0025e+01)
Source | SS df MS Number of obs = 296-------------+------------------------------ F( 3, 292) = 14.68
Model | 11698.2506 3 3899.41685 Prob > F = 0.0000Residual | 77554.2174 292 265.596635 R-squared = 0.1311
-------------+------------------------------ Adj R-squared = 0.1221Total | 89252.468 295 302.550739 Root MSE = 16.297
------------------------------------------------------------------------------y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------x1 | 3.223748 .9682773 3.33 0.001 1.318061 5.129435x2 | -5.863431 .9979716 -5.88 0.000 -7.82756 -3.899302x3 | 1.226024 .9680623 1.27 0.206 -.6792399 3.131288
_cons | 3.44261 1.005671 3.42 0.001 1.463327 5.421893------------------------------------------------------------------------------
Note: Some of the predicted variances were negative, so the weights were couldn’t be calculatedfor these observations, and they were dropped from the WLS regression. The smaller sample sizehides some of the increase in precision.
8/3/2019 Metrics Guide 1
http://slidepdf.com/reader/full/metrics-guide-1 12/29
Study Guide for Econometrics, page 12of29.
Unit 4: Instrumental Variables Regression
Endogeneity: E[e X] ≠ 0 .
OLS with endogenous regressors: E[β OLS
] = β + ( ′X X)−1( ′X e) ; biased.
Instrumental variable: has ability to predict endogenous regressors.Assumptions/requirements for the instrument:
At least as many instruments as explanatory variables; #Z ≥#X .
Note: if x j
is exogenous, it is technically used as an instrument for itself.
Uncorrelated with unobservables; E[e Z] = 0 ⇒ E[ ′Z e] = 0 ( E[e X] ≠ 0
usually, but not necessarily.)
Correlated with the endogenous explanatory variables; ( ′Z X) is invertible,when same number of instruments. (Generally: ( ′Z X) is of full rank.)
Two-Stage Least Squares (2SLS)
First stage: X = Zγ + u ⇒ γ = ( ′Z Z)−1( ′Z X) ⇒ ˆX = Zγ .
Second stage: regression of Y on ˆX yields β 2SLS
= ( ˆ ′X X)−1( ˆ ′X Y)
(Standard errors incorrect)
Instrumental Variables Regression: direct computation
β 2SLS =
( ˆ ′X X)−1( ˆ ′X Y) is equivalent to ( ′Z X)−1( ′Z Y) = β IV
(when #Z =#X )
Variance in estimator:
Estimated variance: var(β IV ) = ( ′Z X)
−1( ′Z Z)( ′X Z)−1σ
e
2
Inefficiency: var(β IV ) =
var(β OLS
)
corr(x ,z)2 , when #X =#Z = 1 .
Post-estimation tests
Hausman test: explanatory variables are endogenous.
Motivation: because of efficiency, OLS is preferred to IV if no endogeneity.
Null hypothesis: E[β IV
] = β = E[β OLS
] , var[β IV ] ≥ var[β
OLS] .
Alternative hypothesis: E[β IV
] = β .
Check for weak instruments: cov(x ,z) ≈ 0 ?
Motivation: with weak instruments, very inefficient; biases get magnified;also, distribution of estimator may not be approximately normal.
Correlations; F-statistics from first stage.
8/3/2019 Metrics Guide 1
http://slidepdf.com/reader/full/metrics-guide-1 14/29
Study Guide for Econometrics, page 14of29.
Examples of Stata commands:
IV regression: x1 and x2 are exogenous, x3 and x4 are endogenous, and z1 , z2 ,and z3 (plus x1 and x2) are instruments.
ivreg y x1 x2 (x3 x4 = z1 z2 z3)
IV regression, displaying first-stage results.ivreg y x1 x2 (x3 x4 = z1 z2 z3), first
Hausman test
ivreg y x1 x2 (x3 x4 = z1 z2 z3) est sto ivest reg y x1 x2 x3 x4 est sto olsest hausman ivest olsest
Test of over-identification
ivreg y x1 x2 (x3 x4 = z1 z2 z3) predict ehat, residreg ehat x1 x2 z1 z2 z3
8/3/2019 Metrics Guide 1
http://slidepdf.com/reader/full/metrics-guide-1 15/29
Study Guide for Econometrics, page 15of29.
. ivreg y x1 x2 (x3 x4 = z1 z2 z3)
Instrumental variables (2SLS) regression
Source | SS df MS Number of obs = 1234 -------------+------------------------------ F( 4, 1229) = 167.49
Model | 19929.9571 4 4982.48929 Prob > F = 0.0000 Residual | 89870.8488 1229 73.1251821 R-squared = 0.1815
-------------+------------------------------ Adj R-squared = 0.1788 Total | 109800.806 1233 89.0517486 Root MSE = 8.5513
------------------------------------------------------------------------------y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------x3 | -1.771296 .4852945 -3.65 0.000 -2.723393 -.8191982x4 | 4.056543 .1996468 20.32 0.000 3.664856 4.448229x1 | .9336217 .25295 3.69 0.000 .4373601 1.429883x2 | 2.284979 .2538783 9.00 0.000 1.786896 2.783062
_cons | -5.534924 .2459335 -22.51 0.000 -6.01742 -5.052428------------------------------------------------------------------------------Instrumented: x3 x4Instruments: x1 x2 z1 z2 z3------------------------------------------------------------------------------
. est sto ivest
. reg y x1 x2 x3 x4
Source | SS df MS Number of obs = 1234-------------+------------------------------ F( 4, 1229) = 314.22
Model | 55516.0702 4 13879.0176 Prob > F = 0.0000Residual | 54284.7358 1229 44.169842 R-squared = 0.5056
-------------+------------------------------ Adj R-squared = 0.5040Total | 109800.806 1233 89.0517486 Root MSE = 6.646
------------------------------------------------------------------------------y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------x1 | 1.059869 .1963066 5.40 0.000 .6747357 1.445002
x2 | 2.661992 .1938744 13.73 0.000 2.281631 3.042354x3 | .785674 .1315465 5.97 0.000 .5275934 1.043755x4 | 2.795138 .0945471 29.56 0.000 2.609647 2.98063
_cons | -5.416561 .1896704 -28.56 0.000 -5.788675 -5.044447------------------------------------------------------------------------------
. est sto olsest
. hausman ivest olsest
---- Coefficients ----| (b) (B) (b-B) sqrt(diag(V_b-V_B))| ivest olsest Difference S.E.
-------------+----------------------------------------------------------------x3 | -1.771296 .785674 -2.55697 .4671255 x4 | 4.056543 2.795138 1.261404 .17584 x1 | .9336217 1.059869 -.1262471 .1595225
x2 | 2.284979 2.661992 -.377013 .1639113 ------------------------------------------------------------------------------
b = consistent under Ho and Ha; obtained from ivregB = inconsistent under Ha, efficient under Ho; obtained from regress
Test: Ho: difference in coefficients not systematicchi2(4) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 88.01Prob>chi2 = 0.0000
8/3/2019 Metrics Guide 1
http://slidepdf.com/reader/full/metrics-guide-1 17/29
Study Guide for Econometrics, page 17of29.
Unit 5: Systems of Equations
Endogenous: the value of yi
is determined in part by other variables in the system.
Exogenous: the value of xi
is determined by unrelated, outside forces.
Simultaneity bias
Model: y1i = β
1+ β
2 y
2i +…; y
2i = γ 1+ γ
2 y
1i +…
OLS estimates of β and γ are biased.
2SLS for systems of equations with endogenous regressors
Model: y1i =
β 1+ β
2 y
2 i +β 3x1i +
β 4x2i +
e1i
; y2i = γ
1+ γ
2 y
1i + γ 3x1i + γ
4x3i +
e2i.
Someoverlapinexogenousexplanatoryvariables.
Someendogenousvariablesarepredictorsofothers.
Twoormoreequations.
Assumptions: all x variables are exogenous.
“Identification”: to measure the effect of yi
on other outcomes, musthave one variable in this equation that does not appear in others.
Technique:
1. Regress each endogenous explanatory variable yi
on its
exogenous determinants; obtain predicted values, ˆ yi
.
2. Regress each outcome on its exogenous explanatorydeterminants and the predicted values of the endogenous variables.
Note: incorrect standard errors.
Inefficiency: does not take advantage of correlation between anindividual’s unobservables in different equations.
Seemingly Unrelated Regression (SUR) for systems with exogenous regressors
Model: y1i= x
1iβ 1+ e
1i; y
2i= x
2 iβ 2+ e
2i; etc.
Some (or complete) overlap in explanatory variables.
Two or more equations.
OLS estimation of each equation separately is unbiased, but inefficient.
Motivation for SUR: accounting for cross-equation correlation inunobservables can yield more precise estimates.
Technique: FGLS.
3SLS for systems of equations with endogenous regressors
Model: y1i= β
1+ β
2 y
2 i+ β
3x1i+ β
4x2i+ e
1i; y
2i= γ
1+ γ
2 y
1i+ γ
3x1i+ γ
4x3i+ e
2i.
Someoverlapinexogenousexplanatoryvariables.
8/3/2019 Metrics Guide 1
http://slidepdf.com/reader/full/metrics-guide-1 18/29
Study Guide for Econometrics, page 18of29.
Someendogenousvariablesarepredictorsofothers.
Twoormoreequations.
Motivation: Correct for simultaneity bias, plus improve precision.
Technique: 2SLS combined with FGLS.
“Identification”: to measure the effect of yi
on other outcomes, must haveone variable in this equation that does not appear in others.
Efficiency: most efficient estimator.
8/3/2019 Metrics Guide 1
http://slidepdf.com/reader/full/metrics-guide-1 19/29
Study Guide for Econometrics, page 19of29.
Examples of Stata commands:
. reg3 (y1 = x1 x2) (y2 = x1 x3), sur
Seemingly unrelated regression----------------------------------------------------------------------Equation Obs Parms RMSE "R-sq" chi2 P
----------------------------------------------------------------------y1 1000 2 14.30263 0.0384 44.97 0.0000y2 1000 2 14.02793 0.0783 106.18 0.0000----------------------------------------------------------------------
------------------------------------------------------------------------------| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------y1 |
x1 | 1.991289 .4480522 4.44 0.000 1.113123 2.869455x2 | -1.972894 .3969844 -4.97 0.000 -2.750969 -1.194818
_cons | 5.011258 .4524773 11.08 0.000 4.124419 5.898097-------------+----------------------------------------------------------------y2 |
x1 | -3.0162 .4393652 -6.86 0.000 -3.877339 -2.15506
x3 | 3.050422 .3966516 7.69 0.000 2.272999 3.827845 _cons | 6.868261 .4444411 15.45 0.000 5.997172 7.739349------------------------------------------------------------------------------
. reg3 (y1 = x1 x2 y2) (y2 = x1 x3 y1)
Three-stage least-squares regression----------------------------------------------------------------------Equation Obs Parms RMSE "R-sq" chi2 P----------------------------------------------------------------------y1 1000 3 26.71598 -0.1711 13.76 0.0032y2 1000 3 26.19243 -1.8229 59.07 0.0000----------------------------------------------------------------------
------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------y1 |
x1 | 2.549894 1.563719 1.63 0.103 -.5149385 5.614726x2 | -3.443286 1.023338 -3.36 0.001 -5.448992 -1.437579y2 | .7138822 .264056 2.70 0.007 .196342 1.231422
_cons | 10.61793 1.006379 10.55 0.000 8.645468 12.5904-------------+----------------------------------------------------------------y2 |
x1 | -5.942887 .9132913 -6.51 0.000 -7.732905 -4.152869x3 | 5.445124 1.226737 4.44 0.000 3.040764 7.849484y1 | -.9107926 .4018143 -2.27 0.023 -1.698334 -.123251
_cons | 12.70989 4.843172 2.62 0.009 3.217444 22.20233------------------------------------------------------------------------------Endogenous variables: y1 y2
Exogenous variables: x1 x2 x3------------------------------------------------------------------------------
8/3/2019 Metrics Guide 1
http://slidepdf.com/reader/full/metrics-guide-1 20/29
Study Guide for Econometrics, page 20of29.
Unit 6: Policy Analysis
Before-and-after comparisons
Advantages: simplicity
Disadvantage: natural history, natural trend.
Controlling for effects of time
Difference-in-Difference estimation
“Counterfactual”
Natural experiments
Exogeneity requirements
Criticisms
Serial correlation
Exogeneity of policy
8/3/2019 Metrics Guide 1
http://slidepdf.com/reader/full/metrics-guide-1 21/29
Study Guide for Econometrics, page 21of29.
Unit 7: Panel Data Models
Panel data model: yit =
xitβ + e
it; e
it= c
i+ u
it.
Repeated observations of same individuals over time.
Permanent component and transitory component to unobservable.
Strict exogeneity: E[uitx
is] = 0 .
Error structure: E[ ′eie
i] = I
T ×T σ
u
2+ 1
T ×T σ
c
2 .
Pooled OLS (POLS): treats all observations as if from distinct individuals.
Unbiased if E[eit
xit] = 0 ; requires strict exogeneity and E[c
ix
it] = 0 .
Variance calculation incorrect
“Clustered” standard errors
Inefficient, because of cross-correlation between unobservables.
Random effects (RE): GLS with ΩRE
−1= E[ ′e
iei] = I
T ×T σ
u
2+ 1
T ×T σ
c
2 .
Estimator: β RE = ( ′X ΩREX)−1( ′X Ω
REY) .
Unbiased if E[eit
xit] = 0 ; requires strict exogeneity and E[c
ix
it] = 0 .
Most precise.
Fixed effects: OLS with transformed data.
Fixed effects transformation: xit
FE= x
it− x
i , y
it = y
it− y .
Estimator: β FE = ( ′X X)−1( ′X Y) .
Unbiased if E[eit
xit] = 0 ; requires only strict exogeneity.
Possibly inefficient.
First differences: OLS with differenced data.
Fixed effects transformation: Δxit= x
it− x
it−1 , Δ y
it = y
it− y
it−1.
Estimator: β FD= (Δ ′X ΔX)−1(Δ ′X ΔY) .
Unbiased if E[ΔeitΔx
it] = 0 ; requires (less than) strict exogeneity.
Possibly inefficient.
Dummy variables: OLS with a dummy variable for each individual.
Equivalent to Fixed Effects.
Relaxation of strict exogeneity
8/3/2019 Metrics Guide 1
http://slidepdf.com/reader/full/metrics-guide-1 23/29
Study Guide for Econometrics, page 23of29.
Examples of Stata commands
Dataset originally in “wide” format: 1000 observations of the variables y73 , y74 ,y75 , xa73 , xa74 , xa75 , xb73 , xb74 , xb75 , xc (time-invariant), and id (identifier).
. reshape long y xa xb, i(id) j(year)(note: j = 73 74 75)
Data wide -> long-----------------------------------------------------------------------------Number of obs. 1000 -> 3000 Number of variables 11 -> 6 j variable (3 values) -> year xij variables:
y73 y74 y75 -> y xa73 xa74 xa75 -> xa xb73 xb74 xb75 -> xb
-----------------------------------------------------------------------------
. reg y xa xb xc, cluster(id)Linear regression Number of obs = 3000
F( 3, 999) = 1221.84 Prob > F = 0.0000 R-squared = 0.5720 Root MSE = 5.0551
(Std. Err. adjusted for 1000 clusters in id)------------------------------------------------------------------------------
| Robusty | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------xa | 2.149723 .0918978 23.39 0.000 1.969388 2.330058xb | -3.151929 .0910431 -34.62 0.000 -3.330586 -2.973271xc | 4.564069 .1006943 45.33 0.000 4.366472 4.761665
_cons | 1.054455 .0942891 11.18 0.000 .8694272 1.239482------------------------------------------------------------------------------
. xtreg y xa xb xc, re i(id)
Random-effects GLS regression Number of obs = 3000Group variable: id Number of groups = 1000
R-sq: within = 0.3781 Obs per group: min = 3between = 0.7303 avg = 3.0overall = 0.5720 max = 3
Random effects u_i ~ Gaussian Wald chi2(3) = 3915.38corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
------------------------------------------------------------------------------y | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------
xa | 2.151934 .0920012 23.39 0.000 1.971615 2.332253xb | -3.150849 .0906708 -34.75 0.000 -3.32856 -2.973137xc | 4.564102 .0970299 47.04 0.000 4.373927 4.754278
_cons | 1.054451 .0942954 11.18 0.000 .8696358 1.239267-------------+----------------------------------------------------------------
sigma_u | .73494385 sigma_e | 5.0024879
rho | .02112818 (fraction of variance due to u_i)
8/3/2019 Metrics Guide 1
http://slidepdf.com/reader/full/metrics-guide-1 24/29
Study Guide for Econometrics, page 24of29.
------------------------------------------------------------------------------
. est sto reest
. xtreg y xa xb xc, fe i(id)note: xc omitted because of collinearity
Fixed-effects (within) regression Number of obs = 3000Group variable: id Number of groups = 1000
R-sq: within = 0.3782 Obs per group: min = 3between = 0.1315 avg = 3.0overall = 0.2423 max = 3
F(2,1998) = 607.55corr(u_i, Xb) = -0.0056 Prob > F = 0.0000
------------------------------------------------------------------------------y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------xa | 2.204303 .1121128 19.66 0.000 1.984432 2.424173 xb | -3.125336 .108589 -28.78 0.000 -3.338295 -2.912376 xc | (omitted)
_cons | .9069813 .0913603 9.93 0.000 .7278098 1.086153 -------------+----------------------------------------------------------------
sigma_u | 5.3422733 sigma_e | 5.0024879
rho | .53281074 (fraction of variance due to u_i)------------------------------------------------------------------------------F test that all u_i=0: F(999, 1998) = 1.06 Prob > F = 0.1326
. est sto feest
. hausman feest reest
---- Coefficients ----| (b) (B) (b-B) sqrt(diag(V_b-V_B))| feest reest Difference S.E.
-------------+----------------------------------------------------------------
xa | 2.204303 2.151934 .0523684 .0640706 xb | -3.125336 -3.150849 .0255129 .0597526 ------------------------------------------------------------------------------
b = consistent under Ho and Ha; obtained from xtregB = inconsistent under Ha, efficient under Ho; obtained from xtreg
Test: Ho: difference in coefficients not systematic
chi2(2) = (b-B)'[(V_b-V_B)^(-1)](b-B)= 0.82
Prob>chi2 = 0.6640
. reshape wide xa xb y, i(id) j(year)(note: j = 73 74 75)
Data long -> wide
-----------------------------------------------------------------------------Number of obs. 3000 -> 1000 Number of variables 8 -> 13 j variable (3 values) year -> (dropped)xij variables:
xa -> xa73 xa74 xa75 xb -> xb73 xb74 xb75 y -> y73 y74 y75
-----------------------------------------------------------------------------
8/3/2019 Metrics Guide 1
http://slidepdf.com/reader/full/metrics-guide-1 25/29
Study Guide for Econometrics, page 25of29.
. gen dy74 = y74-y73
. gen dy75 = y75-y74
. gen dxa74 = xa74-xa73
. gen dxa75 = xa75-xa74
. gen dxb74 = xb74-xb73
. gen dxb75 = xb75-xa74
. reshape long xa xb y dy dxa dxb, i(id) j(year)(note: j = 73 74 75)(note: dy73 not found)(note: dxa73 not found)(note: dxb73 not found)
Data wide -> long-----------------------------------------------------------------------------Number of obs. 1000 -> 3000 Number of variables 19 -> 11 j variable (3 values) -> year xij variables:
xa73 xa74 xa75 -> xa xb73 xb74 xb75 -> xb
y73 y74 y75 -> y dy73 dy74 dy75 -> dy
dxa73 dxa74 dxa75 -> dxa dxb73 dxb74 dxb75 -> dxb
-----------------------------------------------------------------------------
. reg dy dxa dxb
Source | SS df MS Number of obs = 2000-------------+------------------------------ F( 2, 1997) = 414.78
Model | 47512.8571 2 23756.4285 Prob > F = 0.0000Residual | 114377.487 1997 57.2746556 R-squared = 0.2935
-------------+------------------------------ Adj R-squared = 0.2928Total | 161890.344 1999 80.985665 Root MSE = 7.568
------------------------------------------------------------------------------dy | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------dxa | 2.930953 .1228469 23.86 0.000 2.690032 3.171875dxb | -2.665305 .120511 -22.12 0.000 -2.901646 -2.428965
_cons | -.1314784 .1692294 -0.78 0.437 -.4633631 .2004063------------------------------------------------------------------------------
8/3/2019 Metrics Guide 1
http://slidepdf.com/reader/full/metrics-guide-1 26/29
Study Guide for Econometrics, page 26of29.
Unit 8: Discrete and Limited Dependent Variables
Maximum-likelihood estimation
Philosophy: find the parameters that make the observation most likely.
General technique
1. Select a probability distribution to model the phenomenon.
2. Write out the likelihood of observing the outcome, as a functionof unknown parameters.
3. Find the values of the parameters that make the observation mostprobable.
Examples
Binomial outcome
Linear model with normally distributed unobservables
Binary outcome models: yi = 0 or yi = 1 .Objects of interest: what we want to know.
Predicted probabilities: P[ yi= 1 x
i]
Marginal effects: ∂P[ yi= 1 x
i] ∂x
i.
Linear probability model: OLS with binary outcome yi.
Advantages
Simplicity: easily calculated.
Ease of interpretation: ˆβ are estimated marginal effects; xi
ˆβ is predicted probability.
Permits IV and panel data techniques.
Disadvantages
Heteroskedasticity: given xi , e
itakes one of two values.
Implausible predicted probabilities: P[ yi =
1 xi] can be less
than 0 or greater than 1.
Inconsistency with models.
Maximum likelihood models
“Latent value”
Likelihood function: P[ yixi] = [1−CDF(−x
iβ )] yi [CDF(−x
iβ )](1− yi )
Choice of distribution for ei
Probit model: for normal distribution.
8/3/2019 Metrics Guide 1
http://slidepdf.com/reader/full/metrics-guide-1 27/29
Study Guide for Econometrics, page 27of29.
Marginal effects
Logit
Odds ratios
Interpretation of estimated coefficients
Not marginal effectsSign and relative magnitude only
Marginal effects at average, ∂P[ y = 1 x] ∂x (probit)
Odds ratio: P[ y = 1 x + 1] P[ y = 1 x] (logit)
Instrumental variables: the “forbidden regression”
(IV probit)
Logistic regression
Multiple discrete outcomesMultiple unordered outcomes: multinomial logit
Interpretation
Multiple ordered/ranked outcomes, no scale: ordered probit
Count data: Poisson regression
Censored regression
Tobit
Sample selection
Heckman
8/3/2019 Metrics Guide 1
http://slidepdf.com/reader/full/metrics-guide-1 28/29
Study Guide for Econometrics, page 28of29.
Examples of Stata commands:
. probit y x1 x2Iteration 0: log likelihood = -1387.1993Iteration 1: log likelihood = -1260.593Iteration 2: log likelihood = -1260.3134
Iteration 3: log likelihood = -1260.3134
Probit regression Number of obs = 2134LR chi2(2) = 253.77Prob > chi2 = 0.0000
Log likelihood = -1260.3134 Pseudo R2 = 0.0915
------------------------------------------------------------------------------y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------x1 | .4028099 .031099 12.95 0.000 .341857 .4637628x2 | -.2626652 .0297566 -8.83 0.000 -.3209871 -.2043433
_cons | .4113152 .0292439 14.07 0.000 .3539983 .4686322------------------------------------------------------------------------------
. logit y x1 x2Iteration 0: log likelihood = -1387.1993Iteration 1: log likelihood = -1262.4423Iteration 2: log likelihood = -1261.0108Iteration 3: log likelihood = -1261.0105Iteration 4: log likelihood = -1261.0105
Logistic regression Number of obs = 2134LR chi2(2) = 252.38Prob > chi2 = 0.0000
Log likelihood = -1261.0105 Pseudo R2 = 0.0910
------------------------------------------------------------------------------y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------x1 | .665104 .0529186 12.57 0.000 .5613854 .7688226x2 | -.4295009 .0497464 -8.63 0.000 -.5270021 -.3319997
_cons | .6753662 .0490636 13.77 0.000 .5792033 .7715291------------------------------------------------------------------------------
8/3/2019 Metrics Guide 1
http://slidepdf.com/reader/full/metrics-guide-1 29/29
Study Guide for Econometrics, page 29of29.
. mlogit y x1 x2 x3Iteration 0: log likelihood = -3433.7503Iteration 1: log likelihood = -3295.8062Iteration 2: log likelihood = -3289.9601Iteration 3: log likelihood = -3289.9381Iteration 4: log likelihood = -3289.9381
Multinomial logistic regression Number of obs = 3313LR chi2(6) = 287.62Prob > chi2 = 0.0000
Log likelihood = -3289.9381 Pseudo R2 = 0.0419
------------------------------------------------------------------------------y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------1 |
x1 | -.222846 .0407254 -5.47 0.000 -.3026663 -.1430256 x2 | .0026875 .0395999 0.07 0.946 -.074927 .080302 x3 | -.032302 .0392481 -0.82 0.410 -.1092268 .0446228
_cons | -.1885108 .0391199 -4.82 0.000 -.2651843 -.1118373 -------------+----------------------------------------------------------------2 |
x1 | -.7996707 .0541183 -14.78 0.000 -.9057406 -.6936007
x2 | -.1135117 .0509981 -2.23 0.026 -.2134661 -.0135573x3 | -.3267343 .051504 -6.34 0.000 -.4276803 -.2257884
_cons | -1.070102 .0547543 -19.54 0.000 -1.177419 -.9627856-------------+----------------------------------------------------------------3 | (base outcome)------------------------------------------------------------------------------