8/9/2019 Introduction Econometrics R
1/48
Econometrics using R
Rajat Tayal
Fourth Quantitative Finance Workshop
December 21-December 24, 2012
Indian Institute of Technology, Kanpur
23 December 2012
Rajat Tayal (IIT Kanpur) Introduction to Estimation/Computing Environment -II
23 December 2012 1 / 1
http://find/8/9/2019 Introduction Econometrics R
2/48
Outline of the presentation
Linear regression
Simple linear regressionMultiple linear regressionPartially linear modelsFactors, interactions, and weightsLinear regression with time series dataLinear regression with panel dataSystems of linear equations
Regression diagnostics
Leverage and standardized residualsDeletion diagnosticsThe function influence.measures()Testing for heteroskedasticityTesting for functional formTesting for autocorrelationRobust standard errors and tests
Rajat Tayal (IIT Kanpur) Introduction to Estimation/Computing Environment -II
23 December 2012 2 / 1
http://find/8/9/2019 Introduction Econometrics R
3/48
Part I
Linear regression
Rajat Tayal (IIT Kanpur) Introduction to Estimation/Computing Environment -II
23 December 2012 3 / 1
http://find/8/9/2019 Introduction Econometrics R
4/48
Introduction
The linear regression model, typically estimated by ordinary least squares(OLS), is the workhorse of applied econometrics. The model is
yi =xTi +i, i= 1, 2, . . . , n. (1)
y=X+ (2)For cross-sections:
E(|X) = 0 (3)
Var(|X) =2I (4)
For time series:E(j|xi) = 0, i j. (5)
Rajat Tayal (IIT Kanpur) Introduction to Estimation/Computing Environment -II 23 December 2012 4 / 1
http://find/8/9/2019 Introduction Econometrics R
5/48
Introduction
We estimate the OLS by:
= (XTX)1XTy (6)
The corresponding fitted values are: y=X, the residuals are = y yand the residual sum of squares is T.
In R, models are typically fitted by calling a model-fitting function, in thiscase lm(), with a formula object describing the model and a data.frameobject containing the variables used in the formula.
fm < lm(formula, data, . . .)
Rajat Tayal (IIT Kanpur) Introduction to Estimation/Computing Environment -II 23 December 2012 5 / 1
http://find/8/9/2019 Introduction Econometrics R
6/48
8/9/2019 Introduction Econometrics R
7/48
The first example
In view of the wide range of the variables, combined with aconsiderable amount of skewness, it is useful to take logarithms.
The goal is to estimate the effect of the price per citation on thenumber of library subscriptions.
To explore this issue quantitatively, we will fit a linear regressionmodel,
log(subs)i=1+2log(citeprice)i+i (7)
Rajat Tayal (IIT Kanpur) Introduction to Estimation/Computing Environment -II 23 December 2012 7 / 1
http://find/8/9/2019 Introduction Econometrics R
8/48
The first example
Here, the formula of interest is log(subs) log(citeprice). This can be usedboth for plotting and for model fitting:
> plot(log(subs) ~ log(citeprice), data = journals)> jour_lm abline(jour_lm)
abline() extracts the coefficients of the fitted model and adds the
corresponding regression line to the plot.
Rajat Tayal (IIT Kanpur) Introduction to Estimation/Computing Environment -II 23 December 2012 8 / 1
http://find/8/9/2019 Introduction Econometrics R
9/48
The first example
Rajat Tayal (IIT Kanpur) Introduction to Estimation/Computing Environment -II 23 December 2012 9 / 1
http://find/8/9/2019 Introduction Econometrics R
10/48
The first example
The function lm() returns a fitted-model object, here stored as jour lm.It is an object of class lm.
> class(jour_lm)[1] "lm"
> names(jour_lm)
[1] "coefficients" "residuals" "effects" "rank" "fitted.va
[7] "qr" "df.residual" "xlevels" "call" "terms" "model"
Rajat Tayal (IIT Kanpur) Introduction to Estimation/Computing Environment -II23 December 2012 10 / 1
http://find/8/9/2019 Introduction Econometrics R
11/48
The first example
> summary(jour_lm)
Call:lm(formula = log(subs) ~ log(citeprice), data = journals)
Residuals:
Min 1Q Median 3Q Max
-2.72478 -0.53609 0.03721 0.46619 1.84808
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.76621 0.05591 85.25
8/9/2019 Introduction Econometrics R
12/48
Generic functions for fitted (linear) model objects
Function Function Description
print() simple printed displaysummary() standard regression outputcoef() (or coefficients()) extracting the regression coefficientsresiduals() (or resid()) extracting residualsfitted() (or fitted.values()) extracting fitted values
anova() comparison of nested modelspredict() predictions for new dataplot() diagnostic plotsconfint() confidence intervals for the regression
coefficientsdeviance() residual sum of squaresvcov() (estimated) variance-covariance matrixlogLik() log-likelihood (assuming normally distributed
errors)AIC() information criteria including AIC, BIC/SBC
(assuming normally distributed errors)
Rajat Tayal (IIT Kanpur) Introduction to Estimation/Computing Environment -II23 December 2012 12 / 1
http://find/8/9/2019 Introduction Econometrics R
13/48
The first example
It is instructive to take a brief look at what the summary() methodreturns for a fitted lm object:
> jour_slm class(jour_slm)
[1] "summary.lm"> names(jour_slm)
[1] "call" "terms" "residuals" "coefficients" "aliased" "sig
[7] "df" "r.squared" "adj.r.squared" "fstatistic" "cov.unsca
> jour_slm$coefficients
Estimate Std. Error t value Pr(>|t|)(Intercept) 4.7662121 0.05590908 85.24934 2.953913e-146
log(citeprice) -0.5330535 0.03561320 -14.96786 2.563943e-33
Rajat Tayal (IIT Kanpur) Introduction to Estimation/Computing Environment -II23 December 2012 13 / 1
http://find/http://goback/8/9/2019 Introduction Econometrics R
14/48
Analysis of variance
> anova(jour_lm)
Analysis of Variance Table
Response: log(subs)
Df Sum Sq Mean Sq F value Pr(>F)
log(citeprice) 1 125.93 125.934 224.04 < 2.2e-16 ***
Residuals 178 100.06 0.562
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
The ANOVA table breaks the sum of squares about the mean (for thedependent variable, here log(subs)) into two parts: a part that isaccounted for by a linear function of log(citeprice) and a part attributed toresidual variation.
Rajat Tayal (IIT Kanpur) Introduction to Estimation/Computing Environment -II23 December 2012 14 / 1
http://find/8/9/2019 Introduction Econometrics R
15/48
Point and Interval estimates
To extract the estimated regression coefficients , the function coef() canbe used:
> coef(jour_lm)
(Intercept) log(citeprice)4.7662121 -0.5330535
> confint(jour_lm, level = 0.95)
2.5 % 97.5 %
(Intercept) 4.6558822 4.8765420
log(citeprice) -0.6033319 -0.4627751
Rajat Tayal (IIT Kanpur) Introduction to Estimation/Computing Environment -II23 December 2012 15 / 1
http://find/8/9/2019 Introduction Econometrics R
16/48
Prediction
Two types of predictions:1 the prediction of points on the regression line and2 the prediction of a new data value.
The standard errors of predictions for new data take into accountboth the uncertainty in the regression line and the variation of theindividual points about the line.
Thus, the prediction interval for prediction of new data is larger thanthat for prediction of points on the line. The function predict()provides both types of standard errors.
Rajat Tayal (IIT Kanpur) Introduction to Estimation/Computing Environment -II23 December 2012 16 / 1
http://find/http://goback/8/9/2019 Introduction Econometrics R
17/48
Prediction
> predict(jour_lm, newdata = data.frame(citeprice = 2.11),
interval = "confidence")
fit lwr upr
1 4.368188 4.247485 4.48889
> predict(jour_lm, newdata = data.frame(citeprice = 2.11),
interval = "prediction")
fit lwr upr
1 4.368188 2.883746 5.852629
The point estimates are identical (fit) but the intervals differ.The prediction intervals can also be used for computing and visualizingconfidence bands.
Rajat Tayal (IIT Kanpur) Introduction to Estimation/Computing Environment -II23 December 2012 17 / 1
http://find/8/9/2019 Introduction Econometrics R
18/48
Prediction
> lciteprice jour_pred plot(log(subs) ~ log(citeprice), data = journals)> lines(jour_pred[, 1] ~ lciteprice, col = 1)
> lines(jour_pred[, 2] ~ lciteprice, col = 1, lty = 2)
> lines(jour_pred[, 3] ~ lciteprice, col = 1, lty = 2)
Rajat Tayal (IIT Kanpur) Introduction to Estimation/Computing Environment -II23 December 2012 18 / 1
http://find/8/9/2019 Introduction Econometrics R
19/48
Prediction
Figure: Scatterplot with prediction intervals for the journals data
Rajat Tayal (IIT Kanpur) Introduction to Estimation/Computing Environment -II23 December 2012 19 / 1
http://find/8/9/2019 Introduction Econometrics R
20/48
Plotting lm objects
The plot() method for class lm() provides six types of diagnostic plots,four of which are shown by default.We set the graphical parameter mfrow to c(2, 2) using the par() function,creating a 2 2 matrix of plotting areas to see all four plots simultaneously:
> par(mfrow = c(2, 2))
> plot(jour_lm)
> par(mfrow = c(1, 1))
The first provides a graph of residuals versus fitted values, the second is aQQ plot for normality, plots three and four are a scale-location plot and aplot of standardized residuals against leverages, respectively.
Rajat Tayal (IIT Kanpur) Introduction to Estimation/Computing Environment -II23 December 2012 20 / 1
http://find/8/9/2019 Introduction Econometrics R
21/48
Plotting lm objects
Figure: Diagnostic plots for the journals data
Rajat Tayal (IIT Kanpur) Introduction to Estimation/Computing Environment -II23 December 2012 21 / 1
http://find/8/9/2019 Introduction Econometrics R
22/48
Testing a linear hypothesis
The standard regression output as provided by summary() only indicatesindividual significance of each regressor and joint significance of allregressors in the form of t and F statistics, respectively. Often it is
necessary to test more general hypotheses.This is possible using the function linear.hypothesis() from the car package.Suppose we want to test the hypothesis that the elasticity of the numberof library subscriptions with respect to the price per citation equals 0.5.
H0:2= 0.5 (8)
Rajat Tayal (IIT Kanpur) Introduction to Estimation/Computing Environment -II23 December 2012 22 / 1
http://find/8/9/2019 Introduction Econometrics R
23/48
Testing a linear hypothesis
> linear.hypothesis(jour_lm, "log(citeprice) = -0.5")
Linear hypothesis test
Hypothesis:
log(citeprice) = - 0.5
Model 1: restricted model
Model 2: log(subs) ~ log(citeprice)
Res.Df RSS Df Sum of Sq F Pr(>F)1 179 100.54
2 178 100.06 1 0.48421 0.8614 0.3546
Rajat Tayal (IIT Kanpur) Introduction to Estimation/Computing Environment -II23 December 2012 23 / 1
http://find/8/9/2019 Introduction Econometrics R
24/48
Multiple linear regression
In economics, most regression analyses comprise more than a singleregressor. Often there are regressors of a special type, usually referred toas dummy variables in econometrics, which are used for codingcategorical variables.
> data("CPS1988")
> summary(CPS1988)
wage education experience ethnicity smsa region parttime
Min. : 50.05 Min. : 0.00 Min. :-4.0 cauc:25923 no : 7223 northeast:6441 no :256311st Qu.: 308.64 1st Qu.:12.00 1st Qu.: 8.0 afam: 2232 yes:20932 midwest :6863 yes: 2524
Median : 522.32 Median :12.00 Median :16.0 south :8760Mean : 603.73 Mean :13.07 Mean :18.2 west :6091
3rd Qu.: 783.48 3rd Qu.:15.00 3rd Qu.:27.0
Max. :18777.20 Max. :18.00 Max. :63.0
The model of interest is
log(wage) =1+2experience+3experience2+4education+5ethnicity+
(9)
Rajat Tayal (IIT Kanpur) Introduction to Estimation/Computing Environment -II23 December 2012 24 / 1
http://find/8/9/2019 Introduction Econometrics R
25/48
Multiple linear regression
> cps_lm summary(cps_lm)
Call:
lm(formula = log(wage) ~ experience + I(experience^2) + education +
ethnicity, data = CPS1988)
Residuals:
Min 1Q Median 3Q Max
-2.9428 -0.3162 0.0580 0.3756 4.3830Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.321e+00 1.917e-02 225.38
8/9/2019 Introduction Econometrics R
26/48
Comparison of models
With more than a single explanatory variable, it is interesting to test for
the relevance of subsets of regressors. For any two nested models, this canbe done using the function anova(). E.g. to test for the relevance of thevariable ethnicity, we explicitly fit the model without ethnicity and thencompare both models.
> cps_noeth anova(cps_noeth, cps_lm)
Analysis of Variance Table
Model 1: log(wage) ~ experience + I(experience^2) + education
Model 2: log(wage) ~ experience + I(experience^2) + education + ethnicity
Res.Df RSS Df Sum of Sq F Pr(>F)
1 28151 9719.6
2 28150 9598.6 1 121.02 354.91 < 2.2e-16 ***---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
This reveals that the effect of ethnicity is significant at any reasonable level.
Rajat Tayal (IIT Kanpur) Introduction to Estimation/Computing Environment -II23 December 2012 26 / 1
http://find/8/9/2019 Introduction Econometrics R
27/48
Comparison of models
> cps_noeth waldtest(cps_lm, . ~ . - ethnicity)
Wald test
Model 1: log(wage) ~ experience + I(experience^2) + education
Model 2: log(wage) ~ experience + I(experience^2) + education
Res.Df Df F Pr(>F)
1 28150
2 28151 -1 354.91 < 2.2e-16 ***---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Rajat Tayal (IIT Kanpur) Introduction to Estimation/Computing Environment -II23 December 2012 27 / 1
http://find/8/9/2019 Introduction Econometrics R
28/48
Part II
Linear regression with panel data
Rajat Tayal (IIT Kanpur) Introduction to Estimation/Computing Environment -II23 December 2012 28 / 1
http://find/8/9/2019 Introduction Econometrics R
29/48
Introduction
There has been considerable interest in panel data econometrics overthe last two decades.
The package plm(Croissant and Millo 2008) contains the relevantfitting functions and methods for specifications in R.
Two types of panel data models:1 Statis linear models2 Dynamic linear models
Rajat Tayal (IIT Kanpur) Introduction to Estimation/Computing Environment -II23 December 2012 29 / 1
I d i
http://find/8/9/2019 Introduction Econometrics R
30/48
Introduction
For illustrating the basic fixed- and random-effects methods, we use the
wellknown Grunfeld data (Grunfeld 1958) comprising 20 annualobservations on the three variables real gross investment (invest), realvalue of the firm (value), and real value of the capital stock (capital) for11 large US firms for the years 1935-1954.
> data("Grunfeld", package = "AER")
> library("plm")
> gr
8/9/2019 Introduction Econometrics R
31/48
One-way panel regression
investit=
1values
+
2capital
+
i+
it (10)where i = 1, . . . , n, t = 1, . . . , T, and the i denote the individual-specific effects. Afixed-effects version is estimated by running OLS on a within-transformed model:
> gr_fe summary(gr_fe)
Oneway (individual) effect Within Model
Call:
plm(formula = invest ~ value + capital, data = pgr, model = "within")
Balanced Panel: n=3, T=20, N=60
Residuals :
Min. 1st Qu. Median 3rd Qu. Max.
-167.00 -26.10 2.09 26.80 202.00
Coefficients :
Estimate Std. Error t-value Pr(>|t|)value 0.104914 0.016331 6.4242 3.296e-08 ***
capital 0.345298 0.024392 14.1564 < 2.2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Total Sum of Squares: 1888900 Residual Sum of Squares: 243980
R-Squared : 0.87084 Adj. R-Squared : 0.79827
F-statistic: 185.407 on 2 and 55 DF, p-value: < 2.22e-16Rajat Tayal (IIT Kanpur) Introduction to Estimation/Computing Environment -II23 December 2012 31 / 1
O l i
http://find/8/9/2019 Introduction Econometrics R
32/48
One-way panel regression
A two-way model could have been estimated upon setting effect =twoways.
If fixed effects need to be inspected, a fixef() method and anassociated summary() method are available.
To check whether the fixed effects are really needed, we compare thefixed effects and the pooled OLS fits by means of pFtest().
> gr_fe pFtest(gr_fe, gr_pool)
F test for individual effects
data: invest ~ value + capital
F = 56.8247, df1 = 2, df2 = 55, p-value = 4.148e-14
alternative hypothesis: significant effects
Rajat Tayal (IIT Kanpur) Introduction to Estimation/Computing Environment -II23 December 2012 32 / 1
O l i
http://find/8/9/2019 Introduction Econometrics R
33/48
One-way panel regression
It is also possible to fit a random-effects version of (3.3) using thesame fitting function upon setting model = random and selecting amethod for estimating the variance components.Four methods are available: Swamy-Arora, Amemiya,Wallace-Hussain, and Nerlove.> gr_re summary(gr_re)
Rajat Tayal (IIT Kanpur) Introduction to Estimation/Computing Environment -II23 December 2012 33 / 1
One wa panel regression
http://find/8/9/2019 Introduction Econometrics R
34/48
One-way panel regression
Oneway (individual) effect Random Effect Model
(Wallace-Hussains transformation)
Call:plm(formula = invest ~ value + capital, data = pgr, model = "random",
random.method = "walhus")
Balanced Panel: n=3, T=20, N=60
Effects:
var std.dev share
idiosyncratic 4389.31 66.25 0.352
individual 8079.74 89.89 0.648
theta: 0.8374
Residuals :
Min. 1st Qu. Median 3rd Qu. Max.
-187.00 -32.90 6.96 31.40 210.00
Coefficients :
Estimate Std. Error t-value Pr(>|t|)(Intercept) -109.976572 61.701384 -1.7824 0.08001 .
value 0.104280 0.014996 6.9539 3.797e-09 ***
capital 0.344784 0.024520 14.0613 < 2.2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Total Sum of Squares: 1988300 Residual Sum of Squares: 257520
R-Squared : 0.87048 Adj. R-Squared : 0.82696F-statistic: 191.545 on 2 and 57 DF -value: < 2.22e-16Rajat Tayal (IIT Kanpur) Introduction to Estimation/Computing Environment -II23 December 2012 34 / 1
One way panel regression
http://find/8/9/2019 Introduction Econometrics R
35/48
One-way panel regression
A comparison of the regression coefficients shows that fixed- andrandomeffects methods yield rather similar results for these data.
To check whether the random effects are really needed, a Lagrangemultiplier test is available in plmtest(), defaulting to the test
proposed by Honda (1985).> plmtest(gr_pool)
Lagrange Multiplier Test - (Honda)
data: invest ~ value + capitalnormal = 15.4704, p-value < 2.2e-16
alternative hypothesis: significant effects
Rajat Tayal (IIT Kanpur) Introduction to Estimation/Computing Environment -II23 December 2012 35 / 1
One way panel regression
http://find/8/9/2019 Introduction Econometrics R
36/48
One-way panel regression
Random-effects methods are more efficient than the fixed-effects
estimator under more restrictive assumptions, namely exogeneity ofthe individual effects. It is therefore important to test for endogeneity,and the standard approach employs a Hausman test. The relevantfunction phtest() requires two panel regression objects, in our caseyielding
> phtest(gr_re, gr_fe)
Hausman Test
data: invest ~ value + capitalchisq = 0.0404, df = 2, p-value = 0.98
alternative hypothesis: one model is inconsistent
In line with the rather similar estimates presented above, endogeneitydoes not appear to be a problem here.
Rajat Tayal (IIT Kanpur) Introduction to Estimation/Computing Environment -II23 December 2012 36 / 1
Dynamic linear models
http://find/8/9/2019 Introduction Econometrics R
37/48
Dynamic linear models
To conclude this section, we present a more advanced example, thedynamic panel data model:
yit=
pi=1
jyi,tj+xTit+ uit, uit=i+t+it (11)
This is estimated by the method of Arellano and Bond (1991) viz.generalized method of moments (GMM) estimator utilizing lagged
endogenous regressors after a first-differences transformation.
Rajat Tayal (IIT Kanpur) Introduction to Estimation/Computing Environment -II23 December 2012 37 / 1
Dynamic linear models
http://find/8/9/2019 Introduction Econometrics R
38/48
Dynamic linear models
> data("EmplUK", package = "plm")
> form empl_ab
8/9/2019 Introduction Econometrics R
39/48
Dynamic linear models
Twoways effects Two steps model
Call:
pgmm(formula = dynformula(form, list(2, 1, 0, 1)), data = EmplUK,effect = "twoways", model = "twosteps", index = c("firm",
"year"), ... = list(gmm.inst = ~log(emp), lag.gmm = list(c(2,
99))))
Unbalanced Panel: n=140, T=7-9, N=1031
Number of Observations Used: 611
Residuals
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.6191000 -0.0255700 0.0000000 -0.0001339 0.0332000 0.6410000
Coefficients
Estimate Std. Error z-value Pr(>|z|)
lag(log(emp), c(1, 2))1 0.474151 0.085303 5.5584 2.722e-08 ***
lag(log(emp), c(1, 2))2 -0.052967 0.027284 -1.9413 0.0522200 .log(wage) -0.513205 0.049345 -10.4003 < 2.2e-16 ***
lag(log(wage), 1) 0.224640 0.080063 2.8058 0.0050192 **
log(capital) 0.292723 0.039463 7.4177 1.191e-13 ***
log(output) 0.609775 0.108524 5.6188 1.923e-08 ***
lag(log(output), 1) -0.446373 0.124815 -3.5763 0.0003485 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1Rajat Tayal (IIT Kanpur) Introduction to Estimation/Computing Environment -II23 December 2012 39 / 1
Dynamic linear models
http://find/8/9/2019 Introduction Econometrics R
40/48
Dynamic linear models
Sargan Test: chisq(25) = 30.11247 (p.value=0.22011)
Autocorrelation test (1): normal = -2.427829 (p.value=0.0075948)
Autocorrelation test (2): normal = -0.3325401 (p.value=0.36974)
Wald test for coefficients: chisq(7) = 371.9877 (p.value=< 2.22e-16)Wald test for time dummies: chisq(6) = 26.9045 (p.value=0.0001509)
The results suggest that autoregressive dynamics are important for thesedata.
Rajat Tayal (IIT Kanpur) Introduction to Estimation/Computing Environment -II23 December 2012 40 / 1
http://find/8/9/2019 Introduction Econometrics R
41/48
Part III
Regression diagnostics
Rajat Tayal (IIT Kanpur) Introduction to Estimation/Computing Environment -II23 December 2012 41 / 1
Review
http://find/8/9/2019 Introduction Econometrics R
42/48
Review
> data("Journals")
> journals journals$citeprice journals$age jour_lm
8/9/2019 Introduction Econometrics R
43/48
Testing for heteroskedasticity
For cross-section regressions, the assumption Var(i|xi) =2 istypically in doubt. A popular test for checking this assumption is theBreusch-Pagan test (Breusch and Pagan 1979).
For our model fitted to the journals data, stored in jour lm, the
diagnostic plots in suggest that the variance decreases with the fittedvalues or, equivalently, it increases with the price per citation.
Hence, the regressor log(citeprice) used in the main model should alsobe employed for the auxiliary regression.
Under H0
, the test statistic of the Breusch-Pagan test approximatelyfollows a 2qdistribution, where q is the number of regressors in theauxiliary regression (excluding the constant term).
Rajat Tayal (IIT Kanpur) Introduction to Estimation/Computing Environment -II23 December 2012 43 / 1
Testing for heteroskedasticity
http://find/8/9/2019 Introduction Econometrics R
44/48
st g o t os st c ty
The function bptest() implements all these flavors of the
Breusch-Pagan test. By default, it computes the studentized statisticfor the auxiliary regression utilizing the original regressors X.
> bptest(jour_lm)
studentized Breusch-Pagan test
data: jour_lm
BP = 9.803, df = 1, p-value = 0.001742Alternatively, the White test picks up the heteroskedasticity. It usesthe original regressors as well as their squares and interactions in theauxiliary regression, which can be passed as a second formula tobptest().
> bptest(jour_lm, ~ log(citeprice) + I(log(citeprice)^2),
+ data = journals)
studentized Breusch-Pagan test
data: jour_lm
BP = 10.912, df = 2, p-value = 0.004271Rajat Tayal (IIT Kanpur) Introduction to Estimation/Computing Environment -II23 December 2012 44 / 1
Testing the functional form
http://find/8/9/2019 Introduction Econometrics R
45/48
g
The assumption E(|X) = 0 is crucial for consistency of theleast-squares estimator. A typical source for violation of thisassumption is a misspecification of the functional form; e.g., byomitting relevant variables. One strategy for testing the functionalform is to construct auxiliary variables and assess their significance
using a simple F test. This is what Ramseys RESETdoes.
The function resettest() defaults to using second and third powers ofthe fitted values as auxiliary variables.
> resettest(jour_lm)
RESET testdata: jour_lm
RESET = 1.4409, df1 = 2, df2 = 176, p-value = 0.2395
Rajat Tayal (IIT Kanpur) Introduction to Estimation/Computing Environment -II23 December 2012 45 / 1
Testing the functional form
http://find/8/9/2019 Introduction Econometrics R
46/48
g
The rainbow test (Utts 1982) takes a different approach to testingthe functional form. It fits a model to a subsample (typically themiddle 50%) and compares it with the model fitted to the full sampleusing an F test.
> raintest(jour_lm, order.by = ~ age, data = journals)Rainbow test
data: jour_lm
Rain = 1.774, df1 = 90, df2 = 88, p-value = 0.003741
This appears to be the case, signaling that the relationship betweenthe number of subscriptions and the price per citation also dependson the age of the journal.
Rajat Tayal (IIT Kanpur) Introduction to Estimation/Computing Environment -II23 December 2012 46 / 1
Testing for autocorrelation
http://find/8/9/2019 Introduction Econometrics R
47/48
g
Let us reconsider the first model for the US consumption function.> library(dynlm)
> data("USMacroG")
> consump1 dwtest(consump1)
Durbin-Watson test
data: consump1
DW = 0.0866, p-value < 2.2e-16
alternative hypothesis: true autocorrelation is greater than 0Further tests for autocorrelation are the Box-Pierce test and theLjung-Box test, both being implemented in the function Box.test() inbase R.> Box.test(residuals(consump1), type = "Ljung-Box")
Box-Ljung test
data: residuals(consum 1)Rajat Tayal (IIT Kanpur) Introduction to Estimation/Computing Environment -II23 December 2012 47 / 1
http://find/8/9/2019 Introduction Econometrics R
48/48