30
Chapter 3 Bivariate Time Series Analysis with R 3.1 Autoregressive Distributed Lag (ARDL) Models. This chapter is concerned with analysis more than one variable at the same time. Accordingly we consider two variables Y t and X t and their lags in this subsection. The concepts in this chapter are important and the limitation to only two variables is not important. An extension to more than two variables is usually straightforward and R programs to implement these extensions do not generally require much extra work. It is most convenient pedagogically to discuss the matters in the simplest bivariate setting. Economic Sources of Lags: We have already encountered lags in univariate economic models in the last chapter dealing with univariate time series. Lags are present in econometrics for several reasons including psychological inertia (habit) of human agents, the time it takes for the consumer to feel that his or her income has reached a new plateau at a potentially ‘permanent’ level of income, normal delays in real world implementations and technological reasons causing delays in implementing the changes in capital labor compositions, institutional delays due to regulation, labor contracts, etc. Consider a simple regression model where a finite number k of lag terms of the regressor variable X are included on the right hand side: Y t = α +β 0 X t +β 1 X t-1 + … β k X t-k + u t , u t ~iid(0, σ 2 ) (3.1.1) If we plot β i against i, it is called a lag distribution, which can have various possible shapes and structures. Instead of letting all k+1 terms β i to be individually and separately determined by a regression fit, it is sometimes parsimonious to link the different β’s with each other by a mathematical relation. Some structures considered in the literature are listed below: Arithmetic Lag: β i = (k+1-i)β, i=1,…,k, (3.1.2) where the lags β i decline from β to 0 on a straight line as i increases from 1 to k. Inverted V Lag: β i = iβ for i [0, k/2] and β i = (k-i)β for i (k/2, k]. (3.1.3) The plot of βi against i goes up and comes down. R snippet is #R3.1.1 1

Chapter 3 Bivariate Time Series Analysis with Rlegacy.fordham.edu/economics/vinod/chap3BivTimeSer.pdf · Chapter 3 Bivariate Time Series Analysis with R 3.1 Autoregressive Distributed

  • Upload
    others

  • View
    35

  • Download
    0

Embed Size (px)

Citation preview

  • Chapter 3Bivariate Time Series Analysis with R

    3.1 Autoregressive Distributed Lag (ARDL) Models.

    This chapter is concerned with analysis more than one variable at the same time. Accordingly we consider two variables Yt and Xt and their lags in this subsection. The concepts in this chapter are important and the limitation to only two variables is not important. An extension to more than two variables is usually straightforward and R programs to implement these extensions do not generally require much extra work. It is most convenient pedagogically to discuss the matters in the simplest bivariate setting.

    Economic Sources of Lags: We have already encountered lags in univariate economic models in the last chapter dealing with univariate time series. Lags are present in econometrics for several reasons including psychological inertia (habit) of human agents, the time it takes for the consumer to feel that his or her income has reached a new plateau at a potentially ‘permanent’ level of income, normal delays in real world implementations and technological reasons causing delays in implementing the changes in capital labor compositions, institutional delays due to regulation, labor contracts, etc.

    Consider a simple regression model where a finite number k of lag terms of the regressor variable X are included on the right hand side:

    Yt = α +β0 Xt+β1Xt-1 + … βkXt-k + ut, ut~iid(0, σ2) (3.1.1)

    If we plot βi against i, it is called a lag distribution, which can have various possible shapes and structures. Instead of letting all k+1 terms βi to be individually and separately determined by a regression fit, it is sometimes parsimonious to link the different β’s with each other by a mathematical relation. Some structures considered in the literature are listed below:

    Arithmetic Lag:βi = (k+1−i)β, i=1,…,k, (3.1.2)

    where the lags βi decline from β to 0 on a straight line as i increases from 1 to k.

    Inverted V Lag:βi = iβ for i ∈[0, k/2] and βi = (k−i)β for i ∈(k/2, k]. (3.1.3)

    The plot of βi against i goes up and comes down. R snippet is

    #R3.1.1

    1

  • ####### Make a function to compute weighted sumslagwt=function(x,w) {#function to compute sum [reverse(w)*x(t-j)] over j=1 to length(w)x2=na.omit(x)print("missing data are assumed to be at the beginning and simply omitted")n=length(x2);m=length(w); nm=n-m; out=rep(NA, nm)for (i in 1:nm) {sumout=x[ (1+i-1) : (i-1+m)]* rev(w)out[i]=sum(sumout) }return(out) }####### End of the function

    #Data sources and descriptions#Table 935. New Privately Owned Housing Units Started 1960 to 2005#http://www.census.gov/compendia/statab/construction_housing/authorizations_starts_and_completions/starts=c(1252, 1313, 1463, 1610, 1529, 1473, 1165, 1292, 1508, 1467, 1434, 2052, 2357, 2045, 1338, 1160, 1538, 1987, 2020, 1745, 1292, 1084, 1062, 1703, 1750, 1742, 1805, 1621, 1488, 1376, 1193, 1014, 1200, 1288, 1457, 1354, 1477, 1474, 1617, 1641, 1569, 1603, 1705, 1848, 1956, 2068)

    #Table 941. Median Sales Price of New Privately Owned One-Family#Houses Sold entire US: 1970 to 2005#http://www.census.gov/const/www/newressalesindex.htmlsales=c(485, 656, 718, 634, 519, 549, 646, 819, 817, 709, 545, 436, 412, 623, 639, 688, 750, 671, 676, 650, 534, 509, 610, 666, 670, 667, 757, 804, 886, 880,877, 908, 973, 1086, 1203, 1283)

    par(mfrow=c(2,1)) #ask R to expect 2 plots in 2 rowsfor (k in 3:10){ii=1:k; bet=1; Arith=(k+1−ii)*bet;start2=lagwt(starts,Arith)# will need to select data starting in 1980 not 1970n=length(sales); nn=length(start2)starts2=start2[(nn-n+1):nn]reg1=lm(sales~starts2)rsq2=(cor(fitted(reg1),sales))^2 print(c(k,round(rsq2,6))) }#choose k=3, where rsquare is the largest (still miserably low)k=3ii=1:k; bet=1; Arith=(k+1−ii)*bet;plot(ii, Arith, typ="l", main="Arithmetic Lag Weights")start2=lagwt(starts,Arith)n=length(sales); nn=length(start2)starts2=start2[(nn-n+1):nn]reg1=lm(sales~starts2)summary(reg1)

    for (k in seq(2,10,2)){# k needs to be an even number herei=1:(k/2); bet=1; InvV1=i*bet; #bet will be estimated

    2

  • i= (1+(k/2)) : k; InvV2=(k-i)*bet;InvV=c(InvV1,InvV2)#combine two parts of inverted Vif (k==4){plot(1:k, InvV, typ="l", main="Inverted V shaped Lag Weights")}start2=lagwt(starts,InvV)#call a function defined beforen=length(sales); nn=length(start2)starts2=start2[(nn-n+1):nn]#housing starts weighted, right lengthreg2=lm(sales~starts2)#summary(reg2) #this has negative adjusted Rsquarersq2=(cor(fitted(reg2),sales))^2 #0.018print(c(k,round(rsq2,6))) }#choose k=2 where r square is largest, though still miserablek=2;i=1:(k/2); bet=1; InvV1=i*bet; i= (1+(k/2)) : k; InvV2=(k-i)*bet;InvV=c(InvV1,InvV2)start2=lagwt(starts,InvV)#housing starts for chosen kn=length(sales); nn=length(start2)starts2=start2[(nn-n+1):nn]#housing starts: chosen k right lengthreg2=lm(sales~starts2)summary(reg2)#since chosen k =2 is trivial, we plot for the k=4 case

    3

  • Koyck Lag Structure (geometrically declining effect of past on current events):In some problems there is no theory to guide us when to stop lagging the regressors, since any specific choice seems arbitrary. In that case, an interesting choice is to choose a continuous function f(λ, k, β0) of some variable λ ∈ (0,1) evaluated at integer values of lag k, given an initial value β0.

    f(λ,k, β0)= β0λk or f(λ,k, β0)= β0(1−λ)λk. (3.1.4)

    This function is well behaved even if k=∞. The sum of coefficients at all lags, even if k is infinitely large by using the geometric series.Σf(λ,k, β0)= β0(1+λ +… +λ k + ...) or Σβk = β0(1−λ)−1. This method, which assumes that all lag coefficients are of the same sign (positive) and decline geometrically, is called the Koyck method. The rate of decline depends on λk and (1−λ) is called the speed of adjustment. Hence, if we estimate λ empirically, we can know both the rate of decline and speed of adjustment. The beauty of Koyck method is that it allows us to estimate λ by a simple regression as explained next. We derive the

    4

  • Koyck model as follows. First write the infinite order autoregressive distributed lag model with autoregressive order zero and distributed lag order for regressor of ∞ as:

    ARDL(0, ∞): Yt = α +β0 Xt+β1Xt-1 + … + ut. (3.1.5)

    Now substitute (3.1.4), the geometrically declining coefficients in (3.1.5) to yield:Yt = α +β0 Xt+β0λ Xt-1 +β0λ2 Xt-2 + … + ut. (3.1.6)

    Lag the above equation by 1 time period and multiply both sides by λ to yield:λ Yt-1 = αλ +β0 λXt-1+β0λ2 Xt-2 +β0λ3 Xt-3 + … + λut. (3.1.7)

    Subtract (3.1.7) from (3.1.6) to yield ARDL(1,1) with autoregressive order 1 and a distributed lag order 1 also:

    Yt −λ Yt-1 = α (1−λ) + β0 Xt+ (ut − λut-1). (3.1.8)

    Error term in (3.1.8) has λ, which must be recognized. We can rewrite (3.1.8) as the estimable Koyck model:

    Yt = α (1−λ) +λ Yt-1 + β0 Xt+ (ut − λut-1), (3.1.9)where a lagged dependent variable is introduced. This creates some technical problems in finding reliable estimation and inference. For example, in place of the usual Durbin-Watson test for autocorrelation of residuals, we need to use Durbin h test. Note that even if maximum lag is infinite, depending on λ, the mean lag for Koyck model need not be very long. The mean lag of Koyck model is [Σkβk]/[Σ βk], where both Σ’s are from 0 to ∞. Verify that the mean lag simplifies to λ /[1−λ] because of a cancellation. For example, if λ=0.5, mean lag is only 1 time period, far less than infinity. Following R snippet implements the Koyck model for our housing starts data

    #R3.1.2 load housing starts (46 observations) and sales (36 obs.) datastarts=ts(starts, start=c(1960,1))sales=ts(sales, start=c(1970,1))library(dyn)reg.koy=dyn$lm(sales~lag(sales,-1)+ starts)print(c ("lamda=", reg.koy$coef[2]), q=F)lamd=reg.koy$coef[2]print(c ("mean lag = lamd / (1-lamd)",lamd / (1-lamd)), q=F)# 31 years is too long as average lag, model is defectiveprint(c ("alpha=", reg.koy$coef[1]/ (1-reg.koy$coef[2])), q=F)print(c ("beta sub 0=", reg.koy$coef[3]), q=F)summary(reg.koy) # starts as a regressor is not significantlibrary(car)durbin.watson(reg.koy, max.lag=4)#Durbin h statistic= auocorr(1)* sqrt(T /(1-T*var(coeff of lagged regressor)))#is unit normalacf1=as.numeric(acf(resid(reg.koy), plot=F)$acf[2])sumr=summary(reg.koy)se=sumr$coef[2,2]; se2=se^2;bigt=length(resid(reg.koy))durbinh=acf1* sqrt(bigt/(1-bigt*se2))durbinhif (abs(durbinh)>1.96) print("regr residuals are autocorrelated")

    5

  • Solow’s Pascal lag (negative Binomial) uses the lag operator L and replaces (3.1.8) by (1−λL)r Yt= β(1−λ)r Xt +εt. (3.1.10)

    #R3.1.3for (k in 3:10){negbinwt=dnbinom(1:k,prob=.1, size=k)# for Poisson, wt= dpois(x=1:k, lambda=1)start2=lagwt(starts,negbinwt)#name has start not starts, ie, no sn=length(sales); nn=length(start2)starts2=start2[(nn-n+1):nn]reg1=lm(sales~starts2)rsq2=(cor(fitted(reg1),sales))^2 print(c(k,round(rsq2,6))) }

    Jorgenson’s rational distributed lag inserts a ratio of polynomials in the lag operator L.Most of the above models are not suitable for the housing starts data for US.

    Adaptive Expectations Model: How can we provide economic theory behind Koyck model? Let us use adaptive expectations framework to rationalize the Koyck model with an explicit example from macroeconomics. Start with model:

    Yt = β0 +β1Xt* + ut, (3.1.11)where Y=demand for money (real cash balances), Xt*= expected long-run or normal rate of interest (perhaps based on some equilibrium model representing some optimum value), which is not observable. The * suggests that it is not observable at time t but its past values are known and hence past errors in expectations (Xt−X*t-1) are observable. Now let γ denote the rate at which the system adapts to past errors in formulating current expectations X*t. The definition of γ leads to the relation:

    X*t−X*t-1 = γ (Xt−X*t-1). (3.1.12) The left side involves learning (adapting) from last period’s expectation compared to the actual. Rewrite this with the unknown on the left side as: X*t= X*t-1 +γ (Xt−X*t-1)=γ Xt +(1−γ) (Xt−X*t-1)Substituting this in model (3.1.11) we have:

    Yt = β0 +β1[γ Xt +(1−γ) (Xt−X*t-1)] + ut. (3.1.13)

    Now lag the model (3.1.14) by one period, multiply it by (1−γ) and subtract the product from (3.1.13) to yield the adaptive expectations model:

    Yt = γ β0 +β1γ Xt +(1−γ) Yt-1 +ut−(1−γ)ut-1. (3.1.14.)where all variables are observable. The interpretation of (3.1.14) is not the same as (3.1.9) of the Koyck model, even if both have a lagged dependent variable on the right side. An obvious difference is that the coeff (1−γ) of the lagged dependent variable is obviously different here and is now related to the rate of adaptation to past errors.

    Partial Adjustment Model: This model says it takes time to adjust. Consider an example where the unobservable desired capital investment is linear function of output as:

    Y*t=β0+ β1Xt + ut. (3.1.15)

    6

  • where the desired amount of capital investment is denoted by Y*t,. The agent cannot achieve the desired investment level right away. It follows the model:

    Yt − Yt-1 = δ (Y*t −Yt-1), (3.1.16)where the left side has actual change and the right side has the desired change. Write this as: Yt = Yt-1 + δ (Y*t −Yt-1) =(1−δ)Yt-1 + δ (Y*t ). Upon replacing the unobservable by the right side of (3.1.15) we have the partial adjustment model:

    Yt=(1−δ)Yt-1 + δ (β0+ β1Xt + ut), (3.1.17)This is almost like the Koyck model. Note that an advantage of the partial adjustment framework here is that the error term is simply δut, which is simpler than the error term under adaptive expectations, ut−(1−γ)ut-1 shown in (3.1.14).

    3.2 Economic interpretations of ARDL(1,1) Model

    The autoregressive distributed lag, ARDL(p,q) model where the order equals p in autoregression and q in distributed lags is defined as:

    yt=β1zt+Σpi=1 β2iyt-i +Σqj=0 β3jzt-j +εt. (3.2.1)

    Professor David Hendry of Oxford University and coauthors have collected an interesting list of eleven possible economic interpretations of the simple ARDL(1,1) model. We have mentioned this model before in the context of (3.1.8). It is convenient to place them in Table 3.1 with the first column designating various models as M1 to M11.

    The unrestricted ARDL(1,1) model is defined as M1yt=β1zt+β2yt-1 + β3zt-1 +εt, (3.2.2)

    where the intercept is suppressed for notational convenience without loss of generality. The ordering of subscripts of β is a bit unusual designed to ease the placement of constraints when we formally test the implicit restrictions of all models.

    #R3.1.4 load housing starts (46 observations) and sales (36 obs) datastarts=ts(starts, start=c(1960,1))dmstarts=starts-mean(starts) #de-meaned seriessales=ts(sales, start=c(1960,1))dmsales=sales-mean(sales)library(dyn) #following regression forced thru origin, intercept=0regM1=dyn$lm(dmsales~dmstarts+lag(dmsales,-1)+ lag(dmstarts,-1)-1)summary(regM1) #only beta2 is significant

    All special cases in Table 3.1 impose some restrictions on the coefficients of the unrestricted ARDL(1,1) of (3.2.2). Some special cases are trivial and require no further discussion. For example, static regression is M2: yt=β1zt+εt, univariate series is M3:yt=β2yt-1 + +εt.

    The special case called “differenced data,” M4 is: ∆yt=β1∆zt++εt, is obtained by requiring that β2=1.and β3= −β1

    7

  • Campbell and Mankiew (1991) consider M4 to illustrate a situation where β1 represents proportion of consumers who are “income-constrained” and (1−β1) as the remaining consumers who are “permanent income” consumers. The coefficient β2=1 means that when the consumer is out-of-equilibrium she will stay there. Also, M4 may be interpreted as a growth rate model. It has the counter-intuitive interpretation that consumers do not try to remove the disequilibria in “level” of the variable. See model M10 below to understand the equilibrium interpretation. A variable z is said to be a leading indicator for forecasting y if the model M5 holds: yt= β3zt-1 +εt

    Partial adjustment Models, M6 have only one autoregressive and no distributed lag, ARDL(1,0), with yt=β1zt+β2yt-1 +εt. This model imagines an economic agent with a target level for y denoted by y* and a one-period quadratic cost function:

    Ct = (yt−yt*)2 +ψ (yt−yt-1)2. (3.2.3)which incorporates the cost of not meeting target plus another cost of adjustment whenever the variable does change. Note that ψ is the partial adjustment coefficient. When economic agent minimizes costs, the first order condition (FOC) yields

    (yt−yt*)+ψ (yt−yt-1)= 0. (3.2.4)

    Now we can rewrite the FOC as: (yt−yt-1) +(1+ψ)-1[y*t − yt-1]=0, which can be further simplified as

    yt = θ yt-1 + (1−θ) y*t where θ =ψ /(1+ψ) for ψ>0. (3.2.5)

    The regression yt=β1zt+β2yt-1 +εt, of M6 suffers from a somewhat subtle problem which deserves to be discussed. Denoting sample estimate of β2 by b2, verify that the mean lag is b2/(1−b2), which is not defined when b2=1. This creates problem when b2 ≈ 1. That is, we have a failure in the joint relationship between two variables (cointegration) arising from the choice of an incorrect specification.

    One can consider an inter-temporal optimization problem and introduce an expectation operator in (3.2.4). The solution to this using Wiener-Hopf methods is derived in Vinod (1996) using frequency domain optimization methods.

    Common factor and autoregressive error, M7 has the specification yt = β1 zt + ut, where the errors are autoregressive: ut = β2 ut-1 + εt, and where the coefficient in error structure is related to the coefficient of zt by the nonlinear relation β3= −β1 β2. Why is it called common factor model? The common factor turns out to be (1−β2L), where L is the lag operator. The common factors operating on left and right side of the equation seems to lack an economic interpretation.

    #R3.1.5 reload model M1 run abovedmsalesLag1=dmsales[1:35] #unfortunately explict def needed heredms=dmsales[2:36]#dependent variable has 35 observationsdmst=dmstarts[12:46]#must start at 12 to get 35 observations rightdmstartsLag1=dmstarts[11:45]#lagged starts at 11#cbind(dms,dmsalesLag1,dmst,dmstartsLag1)#check lags are correct

    8

  • #nonlinear regr needs starting values for beta1 and beta2#let us use model M1 results#bet1=-0.05350598;bet2= 0.94759864;bet3=-0.02123725 #nls is the program for nonlinear fit in RregM7=nls(dms~bet1*dmst+bet2*dmsalesLag1-(bet1*bet2)*dmstartsLag1, start=list(bet1=0.0535, bet2=0.9476) )su=summary(regM7) #Common Factor (Autoregressive error)suebet1=su$coef[1] #estimated beta 1ebet2=su$coef[2] #estimated beta 2fitt=ebet1*dmst+ebet2*dmsalesLag1-(ebet1*ebet2)*dmstartsLag1#fitt denotes fitted value of the dependent variable#adjusted Rsquare usually= 1- (1 - Rsquare )((n - 1)/(n - k - 1))cor1=cor(dms,fitt) #simple correlation between dependent variable and fiitedRsquare=cor1^2adjRsq= 1- (1 - Rsquare )*(34/(34-2))print(c("adjusted R squared", adjRsq), Q=F)

    One lag distributed lag, M8 has the specification: yt=β1zt+ β3zt-1 +εt. For example, y can be the output of coffee and z can be the number of trees planted in previous two years. However this specification is unrealistic when we considers that farmers will be cognizant of market forces, inventories, prices, related prices, etc. Another example is completion of dwellings, given the past history of housing starts (zt). The problem with that specification is that a housing start only influences its own completion, not that of other houses.

    Dead start, M9 specification is: yt=β2yt-1 + β3zt-1 +εt. Hall's (1978) random walk in consumption is a dead start model. Under permanent income hypothesis if we also assume rational expectations, the change in consumption must be an innovation (random walk) according to Hall. The corresponding data generating process (DGP) will be Δyt = εt . This is unrealistic, since it allows the possibility of negative consumption. One should be careful to somehow rule out negative consumption before using such model.

    Homogeneous and general equilibrium correction models, M10 and M11 are best considered together. The long-run solution is static in the sense that it does not change over time, which may be called equilibrium. For example, in a bivariate equilibrium we have, Eyt=y* and Ezt=z* in terms of expected values, assuming that the process does possess a steady state. Consider the equilibrium or expected value of the ARDL(1,1) model of (3.2.2). We have

    y*=β1z*+β2y*+ β3z*. (3.2.6)

    Moving y* to the left side, we have y*=(1−β2)-1(β1 + β3)z*. If we write y* =K1z*, this defines K1=(β1 + β3)/ (1−β2). This imposes 1≠ β2, but no further condition on the coefficients and is called the general equilibrium model M11. The homogeneous equilibrium solution says that in equilibrium y and z are proportional to each other. (e.g. consumption and income) If there is any shock or disturbance to the long run relation in one period, it can induce some changes during the next period. This then means that the M11 specification becomes:

    9

  • Δyt=β1Δzt+(β2−1)(yt-1 − K1zt-1)+εt. (3.2.7)

    Verify that if all beta coefficients of ARDL(1,1) are such that K1=1, we have the model M10 by replacing K1=1 in (3.2.7). In our numerical example of US housing starts as z and sales as y the estimated coefficients are: β1= −0.05350598, β2= 0.94759864, and β3= −0.02123725, where K1=(β1 + β3)/ (1−β2) = −1.426360. However, since only β2 is statistically significantly different from zero, K1=0≠ 1. Hence M11 is not the choice here. A simpler to reject M11 is to check that the beta coefficients do not sum to unity.

    Table 3.1:Model

    TYPE EQUATION RESTRICTIONS

    M1 Autoregressive-Distributed Lag

    t1t31t2t1t zyzy ∈+β+β+β= −− NONE

    M2 Static Regression tt1t uzy +β= 032 =β=β

    M3 Univariate Time Series

    t1t2t yy ω+β= − 031 =β=β

    M4 Differenced Data tt1t zy ς+∆β=∆ 312 ,1 β−=β=β

    M5 Leading Indicator t1t3t zy υ+β= − 021 =β=β

    M6 Partial Adjustment

    t1t2t1t yzy η+β+β= − 03 =β

    M7 Common Factor (Autoregressive error) t1t2t

    tt1t

    euu,uzy

    +β=+β=

    213 ββ−=β

    M8 Finite Distributed Lag

    t1t3t1t zzy ξ+β+β= − 02 =β

    M9 Dead Start t1t21t3t yzy υ+β+β= −− 01 =β

    M10 Homogeneous Equilibrium Correction

    ( ) t1t2t1t )zy(1zy υ+−−β+∆β=∆ − ∑ = =β3

    1i i1

    M11 General Equilibrium Correction

    ( ) ( ) t1t12t1t zKy1zy ∈+−−β+∆β=∆ − NONE

    Note to model M11, notation ( )

    ( )331

    1 1K

    β−β+β

    = when 12 ≠β

    Instead of one lag term in M8 on can certainly consider finite q>1 lags in ARDL(0,q) model. Then we have model M12:

    10

  • yt=β0 +Σqj=0 β3jzt-j +εt. (3.2.8)Since this is not a special case of ARDL(1,1), it is absent in Table 3.1. However, it can be a worthy choice in many applied situations. The important issue of choosing between these models is discussed at the end of this section.

    Impact, Intermediate-term and Long-run Multipliers:The ARDL(0,q) model M12 of (3.1.8) needs further discussion. If y is consumption and z is income, we have a consumption function here. Then, β0 is the impact multiplier. Unit change in income z at initial time and held constant forever after than leads to β0 change in the (mean value of) consumption. Note that for each lag j, the partial derivative (∂consumption)/ (∂ lagged income) equals βj. The partial sum of a few of these lagged derivatives is called the ‘intermediate term’ multiplier and the sum of all lag coefficients (infinite series) is known as the long-run multiplier. The sum of coefficients as q→ ∞ is simply β0(1+λ +λ2 + … ) of the Koyck model and the long-run multiplier becomes β0(1−λ)-1 if |λ|

  • yt=f(x*t), x*t =Et-1xt. (3.2.9)The long-run equilibrium expectational variable x*t is not necessarily directly observable. The expectation at time adapts to changing conditions over time in the adaptive framework. At time t, one knows the actual, lagged x values and the expectation about xt-1 made at time t-1 is subject to the error: (xt-1−x*t-1). From this error experience, the agent presumably learns (adapts). How exactly does she learn is given by assumed relation for expectation formation:

    x*t=γ xt + (1−γ)x*t-1, where 0≤γ ≤ 1 (3.2.10)where γ are weights representing speed of adjustment. Note that when γ =0 we have static expectations which never change, and when γ =1 we have rather fast almost immediately adapting expectations.

    In principle, rational expectations mean that economic agents take all available information into account, without committing to any particular model for mechanical formulation of expectations. Thus γ =0 value will not fit in the RE framework. Hence a statistical test for γ=0 might be test of the rational expectations model.

    Statistical Inference and Estimation with Lagged Dependent Variables.OLS is consistent, though biased for yt=β1zt+β2yt-1 + β3zt-1 +εt. Instrumental variable estimation is possible, but involves additional data, Maddala-Kim (1998). According to Durbin's (1960) estimating functions viewpoint, OLS is optimal despite lagged dependent variable. But if addition to the presence of yt-1 on the right side, there is a further problem of autocorrelated regression errors, then they too can be handled by Durbin (1960) two-step estimator described in Gujarati (1995, p.432) and elsewhere. First step is to regress yt on zt, zt-1 and yt-1, and estimate the coefficient b2 of β2 associated with the lagged regressor yt-1. Second step is to regress pseudo first differences y*t=yt−b2 yt-1 on similar x*t=xt−b2 xt-1. Since this process has optimality properties according some theoretical results related to reaching the Cramer-Rao lower bound on variance, Vinod (1996, 2000) suggests for inference a third step, as follows.Vinod’s Step 3: Denote the residuals of second step above as ut. Denote the residual variance by s2. The score function for β is defined from the partial derivative of the log of the likelihood function. It can be shown to be (1/ s2)Σut x*t, where x*t is the pseudo difference with b2 from the second step. Next define the Godambe pivot function is GPF= (X*′X*)−0.5X*′u. This represents p equations if there are p regressors and can be solved for p coefficients as roots. Next use a bootstrap to resample u and compute 999 values of the roots of the equation: GPF=0 for β. These solutions are then used for confidence intervals and hence for statistical inference.

    Interpretation problems in the presence of expectational variables: McCallum (JME, 1984, pp3-14) associates these problems with long-run effects but Bannerjee et al (1993, p. 65) show that it is really due to invalid weak exogeneity assumption and same problem can be present in short-run elasticity estimation also. McCallum's example is estimation of regression of interest ιt on inflation πt as

    ιt=β0 + β1πt + εt. (3.2.11)

    12

  • The Fisher hypothesis is that β1=1, i.e., in long-run equilibrium the nominal interest rate reflects inflation rate one-for-one. Now the data are generated by expectational variable as ιt=β0 + β1πt|t-1 + εt, where inflationary expectations are denoted by πt|t-1. Let πt be an AR(1) process, with AR coefficient |µ1|

  • The median lag is the first lag point at which the normalized sum of weights exceeds 0.5.It can be shown that the Median lag equals (−log[2(1−β1)]/ log β2), provided this expression is well defined (e.g., logs are of positive numbers).

    Choosing between M1 to M12 Specifications: An R program to estimate all models M1 to M11 is as follows.

    #R3.1.6 R program to compute all models M1 to M11 # Use R3.1.1 to get the housing starts and sales data is in memorystarts=ts(starts, start=c(1960,1))dmstarts=starts-mean(starts) #dm =de-meaned seriessales=ts(sales, start=c(1970,1))dmsales=sales-mean(sales) #subtract mean to avoid the interceptsnam=rep(0,11)nam[1]="Autoregressive-Distributed Lag"nam[2]="Static Regression" nam[3]="Univariate Time Series" nam[4]="Differenced Data" nam[5]="Leading Indicator" nam[6]="Partial Adjustment" nam[7]="Common Factor (Autoregressive error)" nam[8]="Finite Distributed Lag" nam[9]="Dead Start" nam[10]="Homogeneous Equilibrium Correction" nam[11]="General Equilibrium Correction" #number of restrictions by model numbernrestr=c(0,2,2,2,2,1,3,1,1,3,0)# F=[(ReRSS−UnRSS)/r ] /[UnRSS)/df ], adjRsq=rep(NA,11)RSS=rep(NA,11)Fstat=rep(NA,11)Fcrit=rep(NA,11)library(dyn) #following regression forced thru origin, intercept=0regM1=dyn$lm(dmsales~dmstarts+lag(dmsales,-1)+ lag(dmstarts,-1)-1)su=summary(regM1)nam[1];suadjRsq[1]=su$adjRSS[1]=su$df[2]*((su$sigma)^2)UnRSS=RSS[1]refdf=su$df[2]for (i in 2:10) { Fcrit[i]=qf(0.95,df1=nrestr[i],df2=refdf)}

    regM2=dyn$lm(dmsales~dmstarts-1)su=summary(regM2)nam[2];suadjRsq[2]=su$adjRSS[2]=su$df[2]*((su$sigma)^2)Fnum=(su$df[2]*((su$sigma)^2)-UnRSS)/nrestr[2]Fstat[2]=Fnum/ (UnRSS/refdf)

    regM3=dyn$lm(dmsales~lag(dmsales,-1)-1)su=summary(regM3) nam[3]; suadjRsq[3]=su$adj

    14

  • RSS[3]=su$df[2]*((su$sigma)^2)Fnum=(su$df[2]*((su$sigma)^2)-UnRSS)/nrestr[3]Fstat[3]=Fnum/ (UnRSS/refdf)

    regM4=dyn$lm(diff(dmsales)~diff(dmstarts)-1)su=summary(regM4) nam[4]; suadjRsq[4]=su$adjRSS[4]=su$df[2]*((su$sigma)^2)Fnum=(su$df[2]*((su$sigma)^2)-UnRSS)/nrestr[4]Fstat[4]=Fnum/ (UnRSS/refdf)

    regM5=dyn$lm(dmsales~lag(dmstarts,-1)-1)su=summary(regM5) nam[5]; suadjRsq[5]=su$adjRSS[5]=su$df[2]*((su$sigma)^2)Fnum=(su$df[2]*((su$sigma)^2)-UnRSS)/nrestr[5]Fstat[5]=Fnum/ (UnRSS/refdf)

    regM6=dyn$lm(dmsales~dmstarts+lag(dmsales,-1)-1)su=summary(regM6) #partial adjustmentnam[6]; suadjRsq[6]=su$adjRSS[6]=su$df[2]*((su$sigma)^2)Fnum=(su$df[2]*((su$sigma)^2)-UnRSS)/nrestr[6]Fstat[6]=Fnum/ (UnRSS/refdf)

    #NONLINEAR estimation dyn$lm fails. So explict data definitions needed heredmsalesLag1=dmsales[1:35] dms=dmsales[2:36]#dependent variable has 35 observationsdmst=dmstarts[12:46]#must start at 12 to get 35 observations rightdmstartsLag1=dmstarts[11:45]#lagged starts at 11regM7=nls(dms~bet1*dmst+bet2*dmsalesLag1-(bet1*bet2)*dmstartsLag1, start=list(bet1=0.0535, bet2=0.9476) )su=summary(regM7) #Common Factor (Autoregressive error)nam[7]; suebet1=su$coef[1] #estimated beta 1ebet2=su$coef[2] #estimated beta 2fitt=ebet1*dmst+ebet2*dmsalesLag1-(ebet1*ebet2)*dmstartsLag1#fitt denotes fitted value of the dependent variable#adjusted Rsquare defined as 1- (1 - Rsquare )((n - 1)/(n - k - 1))cor1=cor(dms,fitt) #simple correlation between dependent variable and fiitedRsquare=cor1^2adjRsq[7]= 1- (1 - Rsquare )*(34/(34-2))RSS[7]=sum((dms-fitt)^2)Fnum=(RSS[7]-UnRSS)/nrestr[7]Fstat[7]=Fnum/ (UnRSS/refdf)

    regM8=dyn$lm(dmsales~dmstarts+ lag(dmstarts,-1)-1)su=summary(regM8) #Finite Distributed Lagnam[8]; suadjRsq[8]=su$adjRSS[8]=su$df[2]*((su$sigma)^2)Fnum=(su$df[2]*((su$sigma)^2)-UnRSS)/nrestr[8]

    15

  • Fstat[8]=Fnum/ (UnRSS/refdf)

    regM9=dyn$lm(dmsales~lag(dmsales,-1)+ lag(dmstarts,-1)-1)su=summary(regM9) #Dead Startnam[9]; suadjRsq[9]=su$adjRSS[9]=su$df[2]*((su$sigma)^2)Fnum=(su$df[2]*((su$sigma)^2)-UnRSS)/nrestr[9]Fstat[9]=Fnum/ (UnRSS/refdf)

    regM10=dyn$lm(diff(dmsales)~dmstarts+lag((dmsales-dmstarts),-1)-1)su=summary(regM10) nam[10]; suadjRsq[10]=su$adjRSS[10]=su$df[2]*((su$sigma)^2)Fnum=(su$df[2]*((su$sigma)^2)-UnRSS)/nrestr[10]Fstat[10]=Fnum/ (UnRSS/refdf)

    #regM11=dyn$lm(diff(dmsales)~dmstarts+lag((dmsales-bigk*dmstarts),-1)-1)regM11=regM1 #model M11 imposes no retstriction, equivalent to M1bigk=(regM1$coef[1]+regM1$coef[3])/(1-regM1$coef[3])print(" Model M11 is same as the unrestricted model M1, with K=")as.numeric(bigk)su=summary(regM11) #Homogeneous Equilibrium Correctionnam[11]; suadjRsq[11]=su$adjRSS[11]=su$df[2]*((su$sigma)^2)print(" Model, Adjusted R-square, Residual Sum of Squares")cbind(nam, round(adjRsq,5), round(RSS,0))

    print(" Model, F statistic, Critical Value of F, Num df")cbind(nam, round(Fstat,6), round(Fcrit,6), nrestr)

    After running the ARDL regressions we want to compare the models. Which model has the best fit to the data as suggested by adjusted R2? The following Table reports the adjusted R2 for M1 to M10 as produced by the program. It appears that only the nonlinear M7 has a higher adjusted R2 than the unrestricted.

    No. Model name adjusted R-sq residual sum of sq1] Autoregressive-Distributed Lag 0.94794 622592] Static Regression 0.34636 8665733] Univariate Time Series 0.77569 2850054] Differenced Data 0.76803 642585] Leading Indicator 0.11334 11755136] Partial Adjustment 0.82855 2114397] Common Factor (Autoregressive error) 0.95881 629648] Finite Distributed Lag 0.32831 8650689] Dead Start 0.79355 25459810] Homogeneous Equilibrium Correction 0.57451 11439711] General Equilibrium Correction 0.94794 62259

    16

  • One can also test the restrictions formally by statistical testing methods. When restriction is nonlinear, “testing nonlinear restrictions” remains available in econometric texts, Greene (2000, ed. 4, p. 298). However, in the present context it would be simpler to use the following approximate F test. The unrestricted model has residual sum of squares denoted by UnRSS. The alternative models involve r restriction(s) from the last column of Table 3.1 with residual sum of squares denoted by ReRSS. The test statistic is:

    F(r,df) =[(ReRSS−UnRSS)/r ] /[UnRSS)/(n-k)], (3.2.17)where (n-k) denotes the degrees of freedom(df) of the unrestricted model. Note that ReRSS > UnRSS, that is, restrictions always increase the RSS or worsen the fit. The testing issue is whether that worsening is statistically significant or the test statistic exceeds the critical value from F table with the indicated df. From the following results it appears that restrictions do significantly worsen the fit for models M2, M3, M5, M6, M8, M9 and M10

    No. Model Description F statistic Crit.Val. r[2] Static Regression 206.699433 3.294537 2 [3] Univariate Time Series 57.243113 3.294537 2 [4] Differenced Data 0.513648 3.294537 2 [5] Leading Indicator 286.093588 3.294537 2 [6] Partial Adjustment 76.675281 4.149097 1 [7] Common Factor (Autoregressive error) 0.120788 2.90112 3 [8] Finite Distributed Lag 412.625689 4.149097 1 [9] Dead Start 98.857678 4.149097 1 [10] Homogeneous Equilibrium Correction 8.932611 2.90112 3

    Another possibility for comparing the models is out-of-ample forecasting. That is, estimate the models after leaving out say 10% of available data (usually at the end), find the forecasts for those 10% observations, compare these forecasts with known true values and finally choose the model which has the smallest mean squared forecast error.

    We conclude this section by noting that a rich variety of economic insights can be obtained by using ARDL(1,1) model and its extensions. The beauty of this part of econometrics is that one can empirically assess which particular model fits a given data. For example, the estimated housing starts and sales model can be tested for restrictions of some ten alternative models.

    3.3 Stochastic Diffusion Models for Prices The discrete movement of any spot price (S) over time can be studied by using what natural scientists call the diffusion equation, which combines the average return µ and the volatility measured by the standard deviation σ.

    ∆S/S = µ ∆t + σ z (3.3.1)

    where ∆ is the difference operator (∆S =∆St=St – St−∆t), S is the spot price, ∆t is the time duration and z is N(0,1) variable. Note that ∆S/S is the relative change in the spot price and the relative changes times 100 is the percentage change. Eq. (3.3.1) seeks to explain

    17

  • how relative changes are diffused as the time passes around their average, subject to random variation.

    The continuous time diffusion equation is

    dS/S = µ dt + σ dz (3.3.2)

    where d denotes an instantaneous change. The dz in (1) represents a standard Wiener process or Brownian motion, Campbell et al (1997, p. 344), which is a continuous time analogue of the random walk mentioned in chapter 2. The process has a N(0,dt) distribution with 0 ≤ dt ≤ 1 (i.e., the first observation in the sequence has dt = 0 and the last observation has dt = 1). If one wishes to incorporate weekend effects in daily data, one can use so-called Julian calendar dates and cound dt as 2 or 3 days for the weekends.The first term of (3.3.2) is called the drift part and second term is called diffusion part in the natural science literature where these ideas originally come from.

    Some market prices often go through upward or downward phases where there are rapid price changes and high volatility, and stagnant phases. In the basic diffusion model (3.3.1), however, the standard deviation σ is assumed to be constant over time.

    A more general continuous time diffusion model would be to reflect that both the drift and the volatility are both potentially a function of the price level and time.

    dS= µ(t, S) dt + σ(t, S) dz (3.3.3)

    A fairly comprehensive parametric specification allowing changes in both drift and volatility is in Chan et al (1992):

    dS = (α + β S) dt + σ Sγ dz (3.3.4)

    The Chan et al model is flexible and encompasses several common diffusion models. For example, in the drift term, if we have α = η and β = −η, there is said to be mean reversion in prices. If α = 0, β = 1, and γ=1, it is a continuous version of the random walk with drift given in (3.3.1). When γ >1, volatility is highly sensitive to the level of S.

    Depending on the restrictions imposed on the parameters α, β, and γ in equation (3.3.4) one obtains several nested models. The table below shows the nine models (including the unrestricted) considered here and explicitly indicates parameter restrictions for each model.

    Table 3.3.2: Parameter Restrictions on the Diffusion Model dS = (α + β S) dt + σ Sγ dz Model Name α β γ σ Diffusion Model1. Brennan-Schwartz 1 dS = (α + β S) dt + σ S dz

    18

  • 2. CIR SR 0.5 dS = (α + β S) dt + σ S0.5 dz3. Vasicek 0 dS = (α + β S) dt + σ dz4. CIR V 0 0 1.5 dS = σ S1.5 dz5. Dothan 0 0 1 dS = σ S dz6. CEV 0 dS = β S dt + σ Sγ dz7. GBM 0 1 dS = β S dt + σ S dz8. Merton 0 0 dS = α dt + σ dz

    The first three models impose no restrictions on either α or β. Models 4 and 5, set both α and β equal to zero, while Models 6, 7 and 8 set either α or β equal to zero. Model 1 used by Brennan and Schwartz (1980) implies that the conditional volatility of changes in S is proportional to its level. Model 2 is the well known square root model of Cox Ingersoll and Ross (CIR) (1985) which implies that the conditional volatility of changes in S is proportional to the square root of the level. Model 3 is the Ornstein-Uhlenbeck diffusion process first used by Vasicek (1977). The implication of this specification is that the conditional volatility of changes in S is constant. Model 4 used by CIR (1980) and by Constantinides and Ingersoll (1984) indicates that the conditional volatility of changes in S is highly sensitive to the level of S. Model 5 was used by Dothan (1978), Model 6 is the constant elasticity of variance (CEV) process proposed by Cox (1975) and Cox and Ross (1976). Model 7 is the famous geometric Brownian motion (GBM) process first used by Black and Scholes (1973). Finally, Model 8 is used by Merton (1973) to represent Brownian motion with drift.

    This flexible parametric specification of Chan et al is useful since the parameters may be estimated to determine the model that best fits a particular prices. Vinod and Samanta (1997) estimate all these models to study the nature of exchange rate dynamics using daily, weekly and monthly rates for the British Pound and the Deutsche Mark for 1975-1991 period and compare the out-of-sample forecasting performance of the models in Table 3.3.2. The models Brnnan-Schwartz, CIR-SR and Vasicek performed poorly, whereas CIR-VR and GBM were generally the best.

    #R3.1.6library(tseries)x0

  • #calculate dt which is change in timenn=length(y)#dt=(1:nn) /nn; head(dt); tail(dt)set.seed(345)#dz

  • #OUTPUTssum1=summary(unrestr);sum1;ans[1]=sum1$sigma

    Formula: dy ~ (a + b * y2) + s * (y2^g) * dz

    Parameters: Estimate Std. Error t value Pr(>|t|) a -0.0024686 0.0012658 -1.950 0.0512 .b 0.0015462 0.0007603 2.034 0.0421 *g -5.6776645 96.5373675 -0.059 0.9531 s 0.0002323 0.0095317 0.024 0.9806 ---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

    Residual standard error: 0.006845 on 3341 degrees of freedom

    > brennanSch sum1=summary(brennanSch); sum1;ans[2]=sum1$sigma

    Formula: dy ~ (a + b * y2) + s * (y2) * dz

    21

  • Parameters: Estimate Std. Error t value Pr(>|t|) a -2.470e-03 1.265e-03 -1.952 0.0511 .b 1.547e-03 7.601e-04 2.035 0.0419 *s -1.310e-05 7.247e-05 -0.181 0.8566 ---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

    Residual standard error: 0.006844 on 3342 degrees of freedom

    > cirsr sum1=summary(cirsr); sum1;ans[3]=sum1$sigma

    Formula: dy ~ (a + b * y2) + s * sqrt(y2) * dz

    Parameters: Estimate Std. Error t value Pr(>|t|) a -2.469e-03 1.265e-03 -1.951 0.0511 .b 1.547e-03 7.601e-04 2.035 0.0419 *s -1.273e-05 9.368e-05 -0.136 0.8919 ---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

    Residual standard error: 0.006844 on 3342 degrees of freedom

    > vasi sum1=summary(vasi); sum1;ans[4]=sum1$sigma

    Formula: dy ~ (a + b * y2) + s * dz

    Parameters: Estimate Std. Error t value Pr(>|t|) a -2.469e-03 1.265e-03 -1.951 0.0512 .b 1.546e-03 7.601e-04 2.034 0.0420 *s -1.124e-05 1.206e-04 -0.093 0.9258 ---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

    Residual standard error: 0.006844 on 3342 degrees of freedom

    > cirvr sum1=summary(cirvr); sum1;ans[5]=sum1$sigma

    Formula: dy ~ s * (y2^1.5) * dz

    Parameters: Estimate Std. Error t value Pr(>|t|)s -1.123e-05 5.584e-05 -0.201 0.84

    Residual standard error: 0.006847 on 3344 degrees of freedom

    > dothan sum1=summary(dothan); sum1;ans[6]=sum1$sigma

    Formula: dy ~ s * (y2) * dz

    22

  • Parameters: Estimate Std. Error t value Pr(>|t|)s -1.132e-05 7.249e-05 -0.156 0.876

    Residual standard error: 0.006847 on 3344 degrees of freedom

    > cev sum1=summary(cev); sum1;ans[7]=sum1$sigma

    Formula: dy ~ (b * y2) + s * (y2^g) * dz

    Parameters: Estimate Std. Error t value Pr(>|t|)b 6.993e-05 7.112e-05 0.983 0.326g -5.468e+00 9.945e+01 -0.055 0.956s 2.034e-04 8.642e-03 0.024 0.981

    Residual standard error: 0.006848 on 3342 degrees of freedom

    > gbm sum1=summary(gbm); sum1;ans[8]=sum1$sigma

    Formula: dy ~ (b * y2) + s * (y2) * dz

    Parameters: Estimate Std. Error t value Pr(>|t|)b 7.013e-05 7.112e-05 0.986 0.324s -1.211e-05 7.250e-05 -0.167 0.867

    Residual standard error: 0.006847 on 3343 degrees of freedom

    > merton sum1=summary(merton); sum1;ans[9]=sum1$sigma

    Formula: dy ~ a + s * dz

    Parameters: Estimate Std. Error t value Pr(>|t|)a 9.453e-05 1.184e-04 0.798 0.425s -9.642e-06 1.206e-04 -0.080 0.936

    Residual standard error: 0.006848 on 3343 degrees of freedom

    > ans[1] 0.006845415 0.006844379 0.006844393 0.006844403 0.006847209 0.006847225[7] 0.006848287 0.006847253 0.006847616> order(ans) #model 4 =Vasicek has the best fit[1] 2 3 4 1 5 6 8 9 7>

    3.4 Spurious Regression (R2 > DW) and Cointegration

    23

  • DEFINITION of Integration of order d: If a nondeterministic time series has a stationary invertible ARMA representation after differencing d times, it is said to be integrated of order d, denoted by: yt ~ I(d) if (1−L)dyt is stationary or of I(0).According to the above definition, when d=0, the series yt is stationary in levels of its values, and when d=1 it is the change in the levels from one time period to the next that is stationary. A simple example of I(0) series is white noise or a time series sample generated from the normal distribution N(0,2). An example of I(1) model is the random walk model (2.5.7) yt = yt – l + at,. Now consider the case of I(1) data arising on both sides of the regression equation. This is where one must deal with the possibility of spurious regression when the slope coefficient is highly significant (very large t statistic) and the multiple correlation coefficient R2 exceeds the Durbin Watson statistic (DW). In these circumstances it can be proved that the t-test is over-optimistic in accepting a non-zero slope. This is called spurious regression problem. Simple experiment reveals that if x ~I(1) and y~ I(1) are both independent random walks, a regression of y on x should have zero slope. Yet the regressions can have all kinds of slopes and large t-values and bizarre ranges of R2 and DW values. Indeed statistical inference with such data needs many sophisticated tools including unit root testing and cointegration developed by econometricians over the past few decades.

    #R3.4.1set.seed(345);tim=1:100;y=0.6*tim+cumsum(rnorm(100))x=0.7*tim+cumsum(rnorm(100))reg1=lm(y~x)summary(reg1) #this shows the R-square=0.958, t-statistic= 47.289 library(car)durbin.watson(reg1, max.lag=4) # Rsquare= 0.958> the DW statistic 0.1835plot(x,y)lines(reg1)lines(x,fitted(reg1))plot.ts(x)lines(y, lty=2)

    Engle and Granger (1987) proposed a solution to the spurious regression problem, which involves two-step estimation. The first step is the same as spurious regression above, with R3.4.1. The second step involves an error correction model (ECM), which requires the concept of cointegration. Let us discuss the basic ideas of cointegration in elementary setting avoiding a great many technicalities.

    Co-integrationIf two or more variables seem to trend together there is an interest in studying their joint trending. Familiar examples are: short and long term interest rates, household incomes and expenditures, commodity prices (gold) in geographically separated markets, capital appropriations and expenditures by business. Sometimes Economic theory suggests that certain theoretically related variables have an equilibrium relation between them and the equilibrating process forces them to trend together. For example, the quantity that the buyers are willing to demand (qd) and quantity that the sellers are willing to supply (qs) cannot drift too far apart. They may drift apart in the short run perhaps due to seasonal

    24

  • effects, but if they can drift apart without bound, there is no equilibrium. The equilibrium error, zt= qd−qs is excess demand in this case. Adam Smith showed that the invisible hand of the market forces will tend to force zt to approach zero in the dynamic sense implied by zt~I(0).

    Let us think of xt~I(1) and yt~I(1) as two dynamic random walkers moving on a dance floor. If they are dancing together with each other, we expect them to stay together (have some equilibrium distance between the two) not drift apart from each other for any length of time. Cointegration is similar to the two dancers.

    There is little loss of generality in considering the bivariate case in this chapter, although cointegration is more general for linear relations among several variables. Economic theory suggests that equilibrium error zt should not diverge over time, but rather should be stationary. When zt=0, the equilibrium is obviously achieved. But economic life is not so neat. If zt ~N(0, σ2), it can achieve the equilibrium several times, whenever there is a crossing from positive to negative values, or vice versa. Hence having zt become I(0) is a result supportive of economic theory. Cointegration is a statistical description of some equilibria which cannot be fooled by the presence of I(1) variables.

    Co-integration is a study of the behavior of linear combinations of time series, and looks for results, which test economic theory. If two or more I(d) series are trending together, the order of integration for some particular linear combination zt usually representing the equilibrium error should be smaller than d. The joint trending of the two variables may mean that the long-run components cancel each other in some sense.

    Error Correction Models (ECM's) of Co-integration:One of the appealing aspects of co-integration to the economist is its link with economic equilibrium. In dynamic economic models often give rise to the so-called equilibrium steady state. For a two variable system a typical error correction model (ECM) relates ∆yt, the change in one variable is a function of past equilibrium errors (ut) and past changes in both variables:

    ∆xt = γ1(yt–1 −β xt–1) + [p– lags of ∆xt and ∆yt] + white noise errors∆yt = γ2(yt–1 −β xt–1) + [p– lags of ∆xt and ∆yt] + white noise errors

    (3.4.1)

    where p is intended to be a small number (1 to 3) and where the errors may be correlated. If yt ~I(1) and xt~I(1), and they are cointegrated, the errors in first stage regression of yt on xt are of a lower order of integration than 1 or I(0) or stationary. In other words, (yt–1−β xt–1) represent a linear combination of two I(1) variable and are called error correction terms, which are I(0) only if the variables are cointegrated. Thus all variables in (3.4.1) are I(0). Hence the problems associated with I(1) variables are removed.

    REMARK 3.4.1 Economic equilibria and error reductions through learning.If economic agents are intelligent humans (not automatons) they should learn from past mistakes and adjust their behavior over time to changing market conditions. If a model relation among two or more variables represents some equilibrating forces is supported by data, it should satisfy the usual criteria including: (i) goodness of fit or R2, (ii)

    25

  • significance of coefficients, (iii) suitable DW statistic. In addition, we might be able to use the vector of residuals {et} to reveal some learning by agents as follows.

    (a) The |et| as a function of time should have a negative slope. (b) If residuals exhibit oscillations, they should be damped over time. (c) If the DW statistic reveals AR(1) errors the autoregressive coefficient ρ be small (not close to unity).

    Once a strong cointegrating relation is established, which cannot be fooled by the presence of I(1) variables, we should refocus attention on understanding the nature of economic equilibrium. Vinod (2006) suggests computer intensive (maximum entropy density based) bootstrap methods available in the R package called 'meboot'. Clearly, some inference problems involving properties of {et} can be handled by the meboot.

    Consider Irving Fisher's hypothesis, which says that moneylenders should demand higher nominal interest rates it if expected inflation πt is high. It is stated in (3.2.12) as ιt=β0 + β1[µ0+µ1πt ] + εt . This is an economic equilibrium model, where intelligent agents who failed to ask for higher nominal interest rates during inflationary time t-1 will learn from their mistakes and demand a higher it. The cointegration tests can help us decide whether the Fisher model relation is spurious or real. Once we rule out a spurious relation, the primary economic issue of interest is whether agents (money lenders) are learning from past mistakes and the secondary issue is the speed of learning.

    I believe that AR(1) autocorrelated residuals with |ρ| close to 1 suggests no learning at all and |ρ| =0.2, say, suggests fast learning. Cointegration analysis reveals no further information about the speed of learning by economic agents. A study of residuals should generally be undertaken to understand the speed of learning by economic agents. Vinod (2006) argues that the original concept of equilibrium in Samuelson's celebrated Harvard dissertation in 1049 considered evolving economic dynamics. The focus on orders of integration implicit in cointegration literature is not really required by the underlying economic logic of error reduction by economic agents over time if they are approaching a dynamic equilibrium. It is merely for avoiding a discovery of spurious regressions as true theories.

    REMARK 3.4.2 (signs and significance of coefficients on past errors in ECM regression)If the equilibrium error experienced by economic agent at time t-1 is positive, this means (yt–1 > b xt–1). Decreasing the left side (yt < yt-1 or ∆yt < 0) and increasing the right side (b xt >b xt-1, ∆xt>0, since b>0) of the inequality during the current period reduces this error. If the agent learns from past errors in predictable ways, which in turn have implications for the signs of coefficients in (3.4.1) as non-rejection of two hypotheses: γ1>0 and γ2

  • Step 1: OLS on static regression: yt= βxt + εt to get b as the OLS estimator of β. It can be shown that (b − β) ~ Op(T-1), where Op denotes order in probability and it simply means that b approaches the true value β at a rate faster than the usual rate √T if there are T observations, that is, b is a super consistent estimator of β.

    Step 2: Substitute this b in the error correction model∆xt = γ1(yt–1 −b xt–1) + [p– lags of ∆xt and ∆yt] + white noise errors∆yt = γ2(yt–1 −b xt–1) + [p– lags of ∆xt and ∆yt] + white noise errors

    (3.4.2)

    Engle-Granger theorem shows that the usual OLS standard errors in the second step will provide consistent estimates of true standard errors (See Banerjee et al p. 159). We perform the second step for technical statistical reason that we want to obtain consistent estimates from an OLS regression.

    Econometricians in 1970's started worrying about static time series regressions in levels yt on xt, due to spurious regression results. Then came models with general specifications followed by simplifications dictated by statistical tests, e.g. Hendry, Mizon, Richard, etc. Whenever several variables and their lags are included in a model, the chances of having cointegrated subset of regressors are obviously enhanced. The Engle-Granger two-step estimator reinstated some faith in static regression in late 1980's. Thus the literature seems to have come a full circle.

    We note in passing some important problems: (i) Bias is present in OLS estimation of cointegrating parameter, and simulations have documented poor performance in some situations. Mankiw and Shapiro (1986, JME p 165) document presence of size distortions in certain t tests. (ii) The OLS coefficients follow non-normal distributions and often explained as functionals of Wiener process. (iii) The OLS error process may not be I(0) or a martingale difference sequence (MDS). (iv) If more than one cointegrating vector exists, there will be problems due to a failure of weak exogeneity.

    #R3.4.2 #Let us write a function called Lag1Lag1=function(x) {# input= x vector, Output= two lined up vectors for x and x sub (t-1)n=length(x)y=x[1: (n-1)]list(x=x[2:n],y=y)}# functio endsset.seed(345);x=cumsum(rnorm(100))y=0.7*x+rnorm(100)reg1=lm(y~x);reg1r1=resid(reg1)b=reg1$coef[2] #extract the slopen=length(y)#y=ts(y)#x=ts(x)dx=diff(x,1)dy=diff(y,1)Ldy=Lag1(dy)Ldx=Lag1(dx)

    27

  • Lr1=Lag1(r1[2:n])length(lr1$y)length(Ldy$x)cb=cbind(Ldy$x,Lr1$y,Ldy$y,Ldx$y)head(cb)tail(cb)reg2=lm(Ldy$x~Lr1$y+Ldy$y+Ldx$y)summary(reg2)# the coefficient of the lagged residual should be near -1 # Remark 3.4.2 shows why the coefficient should be negative

    EXERCISE multivariate ARDL modelYEAR G PG Y PNC PUC PPT PD PN PS POP 1960 129.7 .925 6036 1.045 .836 .810 .444 .331 .302 180.71961 131.3 .914 6113 1.045 .869 .846 .448 .335 .307 183.71962 137.1 .919 6271 1.041 .948 .874 .457 .338 .314 186.51963 141.6 .918 6378 1.035 .960 .885 .463 .343 .320 189.21964 148.8 .914 6727 1.032 1.001 .901 .470 .347 .325 191.91965 155.9 .949 7027 1.009 .994 .919 .471 .353 .332 194.31966 164.9 .970 7280 .991 .970 .952 .475 .366 .342 196.61967 171.0 1.000 7513 1.000 1.000 1.000 .483 .375 .353 198.71968 183.4 1.014 7728 1.028 1.028 1.046 .501 .390 .368 200.71969 195.8 1.047 7891 1.044 1.031 1.127 .514 .409 .386 202.71970 207.4 1.056 8134 1.076 1.043 1.285 .527 .427 .407 205.11971 218.3 1.063 8322 1.120 1.102 1.377 .547 .442 .431 207.71972 226.8 1.076 8562 1.110 1.105 1.434 .555 .458 .451 209.91973 237.9 1.181 9042 1.111 1.176 1.448 .566 .497 .474 211.91974 225.8 1.599 8867 1.175 1.226 1.480 .604 .572 .513 213.91975 232.4 1.708 8944 1.276 1.464 1.586 .659 .615 .556 216.01976 241.7 1.779 9175 1.357 1.679 1.742 .695 .638 .598 218.01977 249.2 1.882 9381 1.429 1.828 1.824 .727 .671 .648 220.21978 261.3 1.963 9735 1.538 1.865 1.878 .769 .719 .698 222.61979 248.9 2.656 9829 1.660 2.010 2.003 .821 .800 .756 225.11980 226.8 3.691 9722 1.793 2.081 2.516 .892 .894 .839 227.71981 225.6 4.109 9769 1.902 2.569 3.120 .957 .969 .926 230.01982 228.8 3.894 9725 1.976 2.964 3.460 1.000 1.000 1.000 232.21983 239.6 3.764 9930 2.026 3.297 3.626 1.041 1.021 1.062 234.31984 244.7 3.707 10421 2.085 3.757 3.852 1.038 1.050 1.117 236.31985 245.8 3.738 10563 2.152 3.797 4.028 1.045 1.075 1.173 238.51986 269.4 2.921 10780 2.240 3.632 4.264 1.053 1.069 1.224 240.71987 276.8 3.038 10859 2.321 3.776 4.413 1.085 1.111 1.271 242.81988 279.9 3.065 11186 2.368 3.939 4.494 1.105 1.152 1.336 245.01989 284.1 3.353 11300 2.414 4.019 4.719 1.129 1.213 1.408 247.31990 282.0 3.834 11389 2.451 3.926 5.197 1.144 1.285 1.482 249.91991 271.8 3.766 11272 2.538 3.942 5.427 1.167 1.332 1.557 252.61992 280.2 3.751 11466 2.528 4.113 5.518 1.184 1.358 1.625 255.41993 286.7 3.713 11476 2.663 4.470 6.086 1.200 1.379 1.684 258.11994 290.2 3.732 11636 2.754 4.730 6.268 1.225 1.396 1.734 260.71995 297.8 3.789 11934 2.815 5.224 6.410 1.239 1.419 1.786 263.2

    #Read the above data in R an computegd=read.table(file="c:\\data\\gasolinedata.txt", header=T)attach(gd)summary(gd)lnG=log(G) #where G=total US gasoline consumption (expenditure/price index)lnPG=log(PG) #where PG=price index for gasolinelnY=log(Y) #where Y=per capita disposable income

    28

  • lnPNC=log(PNC) #where PNC=price index for new carslnPUC=log(PUC) #where PUC=price index for used carslnPPT=log(PPT) #where PPT=price index for public transportlnPD=log(PD) #where PD=aggregate price index for consumer durableslnPN=log(PN) #where PN=aggregate price index for consumer non-durables lnPS=log(PS) #where PS=aggregate price index for consumer serviceslnPOP=log(POP) #where POP=US total populationreg1=lm(lnGpc~lnPG+lnY+lnPNC+lnPUC+lnPPT+lnPD+lnPN+lnPS)summary(reg1)lnGpc=log(100*G/ POP)tim=1:length(PPT)

    lg = Log(100*G/Pop)ly = Log(Y)lpg= Log(Pg)lpnc = Log(Pnc)lpuc = log(Puc) lppt = log(Ppt) The following software in LIMDEP needs to be translated to Rbracketed x[-1] means x is lagged once etc.

    # trend t=trn(1,1) Create ; lg1=lg[-1] ; ly1=ly[-1] ; ly2=ly[-2] ; lp1=lpg[-1] ; lp2=lpg[-2] $Sample ; 3 - 36 $?? Unrestricted distributed lag model?Regress; Lhs = lg ; Rhs = One,lpnc,lpuc,lppt,t,lpg,lp1,lp2,ly,ly1,ly2 $Calc ; List ; eeu=sumsqdev ; LRPrice = b(6)+b(7)+b(8) ; LRIncome=b(9)+b(10)+b(11) $?? Autoregressive distributed lag model. Adds lagged dependent variable.?Regress; Lhs = lg ; Rhs = One,lpnc,lpuc,lppt,t,lpg,lp1,lp2,ly,ly1,ly2,lg1 $ Calc ; List ; eeardl=sumsqdev ; LRPrice = (b(6)+b(7)+b(8))/(1-b(12)) ; LRIncome= (b(9)+b(10)+b(11))/(1-b(12)) $?? ARMAX 1,1 model. Same as previous + MA term in disturbance?Armax ; Lhs = lg ; Rhs = One,lpnc,lpuc,lppt,t,lpg,lp1,lp2,ly,ly1,ly2 ; Model = 1,0,1 $Calc ; List ; ee101l=sumsqdev ; LRPrice = (b(7)+b(8)+b(9))/(1-b(1)) ; LRIncome= (b(10)+b(11)+b(12))/(1-b(1)) $?? ARMAX 1,2 model?Armax ; Lhs = lg ; Rhs = One,lpnc,lpuc,lppt,t,lpg,lp1,lp2,ly,ly1,ly2 ; Model = 1,0,2 $Calc ; List ; ee101=sumsqdev ; LRPrice = (b(7)+b(8)+b(9))/(1-b(1)) ; LRIncome= (b(10)+b(11)+b(12))/(1-b(1)) $?? Likelihood ratio tests for MA disturbance terms

    29

  • ?Calc ; List ; LRtest = n*log(eeardl/ee101)$Calc ; List ; LRtest = n*log(eeardl/ee102)$

    3.5 Granger Causality Testing

    This topic is discussed in various R packages including Bayesian vector autoregression package called MSBVAR. It estimates all possible bivariate Granger causality tests for m variables. In the bivariate case m=2. Denote the two variables as X and Y The basic aim is to evaluate whether past values of X are useful for predicting Y.

    Granger NON-causality Null hypothesis: The past p values of X (Xt-1, to Xt-p) do not help in predicting the value of Y.

    Regress: Yt = f(Yt-1 to Yt-p, and Xt-1, to Xt-p).

    An F-test in the MSBVAR package of R determines whether the coefficients of the past values of X are jointly zero. The software reports a matrix with m*(m-1) rows that are all of the possible bivariate Granger causal relations. The results include F-statistics and p-values for each test. Estimation is simple and uses OLS models.

    R3.5.1library(MSBVAR); data(IsraelPalestineConflict)granger.test(IsraelPalestineConflict, p=6)

    30