307

Hatanaka econometry

Embed Size (px)

DESCRIPTION

econometry analysis Time-Series-BasedEconometricsUnit Roots and Co-IntegrationsMichio Hatanaka

Citation preview

Page 1: Hatanaka econometry
Page 2: Hatanaka econometry

ADVANCED TEXTS IN ECONOMETRICS

General Editors

C. W. J. GRANGER G. E. MIZON

Page 3: Hatanaka econometry

This page intentionally left blank

Page 4: Hatanaka econometry

Time-Series-BasedEconometrics

Unit Roots and Co-Integrations

Michio Hatanaka

OXFORD UNIVERSITY PRESS

Page 5: Hatanaka econometry

This book has been printed digitally and produced in a standard specificationin order to ensure its continuing availability

OXFORDUNIVERSITY PRESS

Great Clarendon Street, Oxford OX2 6DPOxford University Press is a department of the University of Oxford.

It furthers the University's objective of excellence in research, scholarship,and education by publishing worldwide in

Oxford New YorkAuckland Bangkok Buenos Aires Cape Town Chennai

Dar es Salaam Delhi Hong Kong Istanbul Karachi KolkataKuala Lumpur Madrid Melbourne Mexico City Mumbai Nairobi

Sao Paulo Shanghai Taipei Tokyo Toronto

Oxford is a registered trade mark of Oxford University Pressin the UK and in certain other countries

Published in the United Statesby Oxford University Press Inc., New York

© Michio Hatanaka, 1996

The moral rights of the author have been assertedDatabase right Oxford University Press (maker)

Reprinted 2003

All rights reserved. No part of this publication may be reproduced,stored in a retrieval system, or transmitted, in any form or by any means,

without the prior permission in writing of Oxford University Press,or as expressly permitted by law, or under terms agreed with the appropriate

reprographics rights organization. Enquiries concerning reproductionoutside the scope of the above should be sent to the Rights Department,

Oxford University Press, at the address above

You must not circulate this book in any other binding or coverand you must impose this same condition on any acquirer

ISBN 019-877353-6

Page 6: Hatanaka econometry

Preface

The most vigorous development in econometrics in recent years had undoubt-edly been the unit-root field, including error correction and co-integration. I ammost grateful to the editors of Advanced Texts in Econometrics, ProfessorsC. W. J. Granger and G. E. Mizon, for allowing me an opportunity to surveythese development. The survey has naturally suggested a number of researchtopics, and the results of one of them are included in the book. It is assumedthat readers are acquainted with (a) fundamentals of the algebra of liner vectorspaces, (b) standard time-series analysis of stationary process including the linearprediction theory, and (c) standard asymptotic theory of inference used in econo-metric theory. However, no other mathematics or statistics are presupposed noris mathematical rigour sought in the writing. In particular unit-root problems areexplained from their most elementary starting-points. Graduate students at anadvanced level should be able to understand the book. The statistical proceduresare explained in detail, and the results of applications are emphasized. The appli-cations that I have in mind are primarily to macro-economic time series ratherthan to financial series, the analyses of the two kinds of series often requiringdifferent concepts and tools. My survey does not trace the historical sequenceof developments, but rather selects the topics worth noting as at the time ofwriting (the second half of 1992 to the end of 1993).

The book consists of two parts, Part I deals with the univariate unit root, i.e. tosee if a stochastic trend is present when each time series is analysed separately.It is summarized in the last part of Chapter 1. Part II discusses co-integration,i.e. the empirical investigation of long-run relationships among a number of timeseries.

Critics of unit-root tests deny even their motivation. I disagree. The questionthat the tests try to answer is whether or not the economy is capable of restoringthe trend line of a variable after it is forced by shocks to deviate from the line. Ifthe answer is affirmative (negative) the variable is trend (difference) stationary.What the trend line is, and to what extent we can answer the question, is exam-ined in Part I. As in the case of other problems in econometrics one should bewarned against optimism about the precision of results, but I cannot imaginethat answers to the question can be irrelevant to macroeconomic modelling.

If unit roots are involved in a set of variables we can investigate the long-run relationships among the variables, which is a new branch of econometrics.Moreover, unit roots influence the appropriate selection of inference methods tobe adopted in all econometric studies. The new econometrics differs from theold even at the level of a simple regression model in its theoretical aspects, ifnot in most computations. This is explained in Part II.

Page 7: Hatanaka econometry

vi Preface

Most of Part II was written before I had access to another book onco-integration by Professors Banerjee, Dolado, Galbraith, and Hendry, Co-integration, Error-Correction, and the Econometric Analysis of Non-StationaryData (Oxford University Press, 1993). It turned out that there is virtually nooverlapping, which shows that there are divergent developments in this field.Needless to say that I did later take advantage of their book.

Yasuji Koto has kindly permitted me to incorporate a portion of our jointresearch in this book. Clive Granger, Koichi Maekawa, Kimio Morimune, andHiro Toda have kindly read the manuscript, and offered valuable advice toimprove my writing. Mototsugu Fukushige has provided me with a large numberof references that are difficult to obtain in Japan. Noriko Soyama has helped mea great deal with her efficient typing of the manuscript. I feel much obliged toall of them.

Michio HatanakaMachikaneyama, Toyonaka, OsakaJuly 1994

Page 8: Hatanaka econometry

Contents

List of figures xi

PARTI UNIT-ROOT TESTS IN UNIVARIATE ANALYSIS 1

1. Stochastic Trend and Overview of Part I 3

1.1 Stochastic Trend 31.2 Stochastic Trend as a Logical Implication of an

Economic Theory? 91.3 Influences upon the Testing of Economic Theories 101.4 Overview of Part I 13

2. Trend Stationarity vs. Difference Stationarity 16

2.1 Basic Discrimination 162.2 Long-Run Component 172.3 Dominating Root of Characteristic Polynomial 222.4 Non-Separate Hypotheses 242.5 Time Aggregation and Other Remarks on the Data 25

3. Discrimination in Terms of the Long-Run Component:A Test for Trend Stationarity 28

3.1 Non-parametric Variance Ratios and Mean-Reverting 283.2 Difficulty of Discrimination through the Non-parametric

Variance Ratios 303.3 Time-Series Decomposition 333.4 Parametric MA Unit-Root Test: A Test for Trend

Stationarity against Difference Stationarity 34

4. Unit-Root Asymptotic Theories (I) 40

4.1 Pure Random Walk without a Drift 404.2 Pure Random Walk possibly with a Drift 44

Page 9: Hatanaka econometry

viii Content s

5. Regression Approac h to the Test for Differenc e Stationarit y (I) 47

5.1 A Method that Does not Work 475.2 Dickey-Fuller Test 48

5.3 The Case where the Deterministic Trend is Confinedto a Constant 50

6. Unit-Roo t Asymptotic Theorie s (II ) 51

6.1 Deterministic Trends 51

6.3 MA Unit-Root Test 59

7. Regression Approach to the Test for Differenc eStationarit y (II) 63

??? ??? ???? ????? ??? ?? ?? ?? 7.2 ARMA in General and the Schwert ARMA 64

7.3 Miscellaneous Remarks 67

Highlight s of Chapter s 4-7 70

8. Viewing the Discriminatio n as a Model SelectionProble m Includin g Deterministi c Trend s 74

8.1 Various Modes of Deterministic Trends 748.2 Encompassing and 'General-to-Specific' Principles

on the Model Selection 778.3 Simulation Studies on the Comparison of P-values 84

9. Result s of the Mode l Selection Approach 90

9.7 Deterministic Trends Amenable to the DS and the TS 919.2 Discrimination between TS and DS 92

10. Bayesian Discriminatio n 102

10.1 Differences between the Bayesian and theClassic Theories 102

70.2 Different Versions of the Hypotheses 10410.3 Problems Associated with the First Version 10510.4 Point Null Hypotheses 10970.5 Results in the Second and the Third Versions 112

6.2 Series Correlations in Dx1 54

??

Page 10: Hatanaka econometry

Contents ix

PART II CO-INTEGRATION ANALYSIS IN ECONOMETRICS 115

Overview 117

11. Different Modelling Strategies on Multiple Relationships 120

11.1 Economic Models and Statistical Models 12077.2 Weak Exogeneity 12411.3 Granger Non-Causality 129

12. Conceptual Framework of the Co-Integration and itsRelation to Economic Theories 135

72.7 Co-integration in the MA Representation 13672.2 Granger Representation Theorem 14012.3 Economic Theory and Co-integration 150

Highlights of Chapter 12 161

13. Asymptotic Inference Theories on Co-Integrated Regressions 164

13.7 Pure Random Walk 16673.2 Deterministic Polynomial Trends 17813.3 Serially Correlated Case 18713.4 Miscellaneous Remarks including Direction of Co-integration 19313.5 A Priori Specified Co-integrating Vector 19813.6 Shortcomings in Many Past Empirical Studies 199

Highlights of Chapter 13

14. Inference on Dynamic Econometric Models 204

14.1 Hendry Model with the Two-Step Method 20714.2 Dynamic Equation with the Two-Step Method 20914.3 Hendry Model with the Single-Step Method 21114.4 Dynamic Equation with the Single-Step Method 215

15. Maximum-Likelihood Inference Theory of Co-Integrated VAR 219

75.7 Determination of Co-integration Rank 22175.2 Testing for Restrictions on the Co-integration Space 230

75.3 A Common Practice on B' 23675.4 Weak Exogeneity and Granger Non-causality 23675.5 Applications 241

Page 11: Hatanaka econometry

x Contents

Appendix 1 Spectral Analysis 247

Appendix 2 Wiener (Brownian Motion) Process 249

Appendix 3 Asymptotic Theories involving a Linear DeterministicTrend 251

Appendix 4 OLS Estimator of Difference-Stationary AutoregressiveProcess 258

Appendix 5 Mathematics for the VAR, VMA, and VARMA 260

Appendix 6 Fully Modified Least-Squares Estimator 265

References 269

Subject Index 289

Author Index 291

Page 12: Hatanaka econometry

List of Figures

1.1 (a) US real GD P 7

1.1(b) Japanes e real GN P 8

1.2(a) US stock price 8

1.2(b) Japanes e stock price 8

1.3(a) White noise 9

1.3(b) Rando m walk 9

2.1 Long-ru n componen t 24

2.2 Dominatin g root s of characteristi c polynomia l 25

3.1 Bartlet t spectra l window functio n 31

3.3 Produc t of window and spectra l density 33

4.1 Cumulativ e distribution s of functional s of Wiener process 43

4.2 A rough sketch of probabilit y densit y functio n 43

7.1 Spectra l densit y function s of Schwert ARMAs 67

7.2 Origina l and windowed spectra l densit y functions :(a) underestimatio n

(b) overestimatio n 69

8.1 (a) Array of Model s 77

8.1 (b) A portio n of the array 78

8.2 Illustration s of joint distribution s of P-values:

(a) DG P is a membe r of M??? ??? ?? ? ?????? ??

8.3 Join t distribution s of P-value s (rando m walk) 86

8.4 Join t distribution s of P-values (Schwer t ARMA, I) 87

8.5 Join t distribution s of P-values (Schwer t ARMA, II) 87

8.6 Join t distribution s of P-values (stationar y AR, I) 88

3.2 Spectral density function ofDx1

32

81

Page 13: Hatanaka econometry

xii List of Figure s

8.7 Join t distribution s of P-values (stationar y AR, II) 88

9.1 Historica l data of real GNP , log:(a) Original series

(b) Difference d series(c) Deterministi c trend for TS 91

9.2 Post-wa r data of real consumption , log:

(a) Original series

(b) Difference d series(c) Deterministi c trend 96

9.3 Post-wa r data of unemploymen t rate:

(a) Original series

(b) Difference d series(c) Deterministi c trend 97

9.4 Post-wa r data of inflation rate:

(a) Original series

(b) Difference d series(c) Deterministi c trend 100

10.1 Bayesian posterio r distributio n and Samplin g distribution :(a) Bayesian posterio r distributio n for • or •(b) Samplin g distributio n for •

??? ???????? ???????????? ???

13.1 Consumptio n and incom e per capita 193

13.2 A tree of ( q1 , q2 ) 194

106

Page 14: Hatanaka econometry

PART I

Unit-Root Tests in Univariate

Analysis

Page 15: Hatanaka econometry

This page intentionally left blank

Page 16: Hatanaka econometry

1

Stochastic Trend and Overview

of Part I

For a long time each economic time series has been decomposed into a determin-istic trend and a stationary process. In recent years the idea of stochastic trendhas emerged, and enriched the framework of analysis to investigate economictime series. In the present chapter I shall (a) explain the meaning of stochastictrend, (b) consider whether it can be derived from any economic theories, and(c) illustrate the impact that it has given to empirical studies of economic theories.

1.1 Stochastic Trend

To explain stochastic trends let us consider two of the simplest models,

and

where yu is a non-stochastic constant, while {et} is i.i.d., with E(et) = 0 andE(s^) — of. (The letters d and s associated with the equation numbers stand for'deterministic' and 'stochastic' respectively.) When these processes start fromt'— 0, (1.1s) and (I.Id) generate {x,} respectively as

(I.2s)

{{it} is a deterministic trend while v, = Y^'s=i £s *s a stochastic trend, {v,} isa non-stationary stochastic process as E(vf) = ta^, which does depend upon tand diverges as t —> oo. {v,} is called the random-walk process. Incidentally XQin (1.2s) may be a random variable or a non-stochastic constant.

We are attempting to reveal the meaning of the newly developed stochastictrend by contrasting it with a traditional time-series model. (I.Id) is too simplefor the purpose because the traditional model contains not only a deterministictrend but also a stationary process. Therefore let us add a stationary process,{u,}, to the right-hand side of (\.2d). I use an autoregressive process in order 1,AR(1), for illustration.

(1.1d)

(1.1s)

(1.2d)

Page 17: Hatanaka econometry

4 Unit-Root Tests in Univariate Analysis

where L is the lag operator, \a\ < 1, and {e,J is i.i.d., with E(st) = 0 andE(s2) = ~al.

The best way to understand the meaning of stochastic trend is to compare(1.2.?) with (1.2d) and/or (1.3d) in terms of their time-series properties.

1.1.1 Prediction error

Let us consider predicting xt+r for each of r = 1, 2 from (xt,xt~i,..., x\).In the model (l.2d), assuming that //, and XQ are known, the prediction of xt+Tis XQ + n(t + T), and the prediction error is zero. In describing the predictionin the model (1.3d) I assume that readers are acquainted with basic parts ofthe prediction theory in the linear process through, for example, Sargent (1979),Granger and Newbold (1986), or Hamilton (1994).J The best linear predic-tion of x!+f is the projection of xt+r upon (xt, x,-\,..., x\,x0), which will bedenoted by P(u,+T\xt,..., X\,XQ). Since XQ + /u,(f + T) is perfectly predictablewe consider P(ut+r\xt,..., x\, XQ). Since us = xs — XQ — us, s = 1 , . . . , f,it is seen that (xt,..., x\, XQ) corresponds one-to-one to (« , , . . . , M I , XQ), andP(ut+r\xt,... ,X\,XQ) = P(u,+r\ut,..., M I , x 0 ) . This last projection is foundequal to aru,. Therefore the best linear prediction of xt+r is

The error of prediction of xt+T is the error of prediction of u,+r, and is given by

For each of r = 1 , 2 , . . . its variance is bounded above by the variance ofu,+i, which is cr2(l — a2)^1. The bound is due to a well-known theorem in theprojection theory.

Let us turn to the model of stochastic trend (1.2s). The best linear predictionof x,+T from (xt,..., x\, XQ) is P(xt+T\x,,..., x\, XQ). Since there is a one-to-onecorrespondence between (x,, ...,XI,XQ) and (e,,..., S\,XQ), we may considerP(xt+T\st, . ..SI,XQ), which is,

And this is the projection of each of t+r, T = 1,2 The error in the predic-xtion of x,+T for T = 1,2, ... is

which should be compared with (1.4d). The variance of (1.4s) is ra2, and asT -> oo it diverges to infinity.

1 {*,) is said to be a linear process if xt — n + po£i + fi\e,-\ + ..., where y^ \bj\ < oo and{e,} is i.i.d. with E(e,) = 0 and £(e,2) = al for all t. The ARMA process is a linear process.

(1.4d)

(1.4s)

Page 18: Hatanaka econometry

Stochastic Trend and Overview of Part I 5

The model of a deterministic trend plus a stationary process and the modelof stochastic trend have totally different views about how the world evolves infuture. In the former the prediction error is bounded even in the infinite horizon,but in the latter the error becomes unbounded as the horizon extends.

1.1.2 Impulse-response function

For any linear stochastic process {x,}

is called the innovation or shock at t. In the model (l.3d) P(xt\xt^\,...) isXQ + fit + «(*,_! — x0 - fi(t - 1)), and the shock at t is e,. In the model (1.2s)P(x,\xt-i,...) is x,-i, and the shock at t is st.

Let us compare the stochastic trend and the traditional model in regard tothe impact of the shock et or et upon xt+t. The series of impacts on x,+r, withindex T = 1 ,2 , . . . , while the time t and e, (or e,) are fixed, is called theimpulse-response function. In the model (1.3d) e, enters into x,+r through aTst.The impulse-response function is (a, a2 , . . .) . Since |a| < 1, the impact diesdown to zero as T —> oo. In the model (1.2s) each of xt+r, r = 1,2,... containse,. The impulse-response function is (1, 1, . . . ) , which never dies down. Thecurrent shock has only a temporary effect in the model (1.3d), while the currentshock has a permanent effect in the model (1.2s).

1.1.3 Returning to a central line

The difference mentioned in Section 1.1.2 may be expressed in another language.The model (1.3d) has a central line, x0 + fit, around which {x,} oscillates. Evenif shocks let x, deviate temporarily from the line there takes place a force tobring it back to the line. On the other hand the model (12s) has no such centralline. Indeed it is a random walk. One might wonder about a deterministic trendcombined with a random walk. Indeed (l.ls) may be extended to

which generalizes (1.2s) to

but here the discrepancy between xt and the line, XQ + fit, becomes unboundedas t —>• oo.2

Having completed the contrast between the stochastic-trend model and thetraditional model, we turn to some generalization of the stochastic trend.

2 Good references for Sections 1.1.1-1.1.3 are Nelson and Plosser (1982) and Stock and Watson(1988a).

(1.5)

Page 19: Hatanaka econometry

6 Unit-Root Tests in Univariate Analysis

If {e,} in (1.25) is replaced by a stationary AR(1), (l.ls) becomes

where \a\ < 1. Determine b0 = 1, hi, i>2, • • • from

We get

Then the model of stochastic trend (1.3s) implies

where vt — ZL=i £s- The projection of x(+r, T = 1,2,..., each upon (x,,xt-i,...,XI,XQ) is

and the error of prediction is

1.1.4 Important terminology

Now I introduce the martingale process as it is often referred to in relation tothe stochastic trend. A stochastic process {x,} is called a martingale process if

In the linear process such as (1.3d), (1.2s), and (1.3s) the conditional expectationand the projection are interchangeable so that (1.6) may also be written

or comparing it with (1.5),

The innovation is equal to the difference. In (1.3d) the innovation at; is et, whileAx, = /j, + s, — (1 — a)(l — aL)~l£,-i. (1.3d) is not a martingale. In (1.2s) theinnovation is s,, and Ax, is also s,. (1.2s) is a martingale. Continuing on (1.2s),

As T —> 00 the variance of the prediction error diverges to infinity. Sincethe shock at t is E,, the impulse-response function is (b\, b2,...). Noting thatbT —> ( 1 — a ) " 1 as T -> oc, the current shock is seen to have a permanenteffect.

(1.3s)

(1.3s)

(1.4s)

(1.6)

(1.6)

(1.6)

Page 20: Hatanaka econometry

Stochastic Trend and Overview of Part I 7

if {ed is independently but not identically distributed (1.2*) is still a martingale.In (1.3s) the innovation is B,, while Axt = (1 — aL)~1£( so that it is not amartingale. However, it will be seen in Chapter 2 that the long-run componentof {xt} in (1.3.?) is a random walk, which is a martingale, and subsequently itwill be seen that only the long-run component matters in the unit-root field. Itis in this sense that the stochastic trend is a martingale process.

The random walk as a statistical description of stock price is quite old in theeconomic literature, but the idea of stochastic trend as a general model originatesin the statistical literature (Box and Jenkins (1970, ch. 4)). They have proposedamodel of {x,} such that AJC, is stationary but xt is not, or A2x, = x,—2xt-\+x,-2is stationary but Ax, is not. The former is denoted by 7(1), reading that the orderof integration is 1, and the latter is written 1(2). The series which is stationarywithout a differencing is 7(0). In (1.2s) and (1.3s) AJC, is stationary but x, is notso that x, is 7(1). The first difference of a stationary series is also stationary, butit has a special property as described in Chapter 2.

We shall use the words difference stationarity and trend stationarity. Differ-ence stationarity is appropriate for describing the stochastic trend, because itis the differencing operation(s) that is indispensable to transform the stochastictrend into a stationary process. In contrast the traditional model such as (1.3d) iscalled trend stationary. There [xt] is not stationary, but subtraction of fit fromjc< makes it stationary. Differencing also makes it stationary, but it is not anindispensable operation to get the stationarity.

The equation (l.ls) is a difference equation with a forcing function {e,}.The characteristic equation is A. — 1 =0 , and the root is unity. In (1.3s) thecharacteristic equation is A,2 — (1 + a)l + a — 0, and one of its two roots isunity. Stochastic trends are characterized by a unit root, and the entire field thatdeals with stochastic trends is called the unit root.

Figure 1 .l(a) and (b) shows real outputs in log scale in the USA and Japan ona quarterly basis. There have been considerable debates in recent years regardingthe presence of a stochastic trend in these series. Figure 1.2(<z) and (b) showsstock prices in log scale in the USA and Japan respectively on a quarterly basis.

FIG. 1.1 (a) US real GDP

Page 21: Hatanaka econometry

Unit-Root Tests in Univariate Analysis

For a few decades it has widely been agreed among econometricians that theseseries contain stochastic trends. Figure 1.3(a) is a realization of an i.i.d. process,while 1.3(6) is a realization of a random-walk process. The latter is smoother. Itis useful for data analysis to keep in mind that the initial value, XQ, in a randomwalk determines the level of subsequent realizations.

8

FIG. 1.1(6) Japanese real GNP

FIG. 1.2(o) US stock price

FIG. 1.2(6) Japanese stock price

Page 22: Hatanaka econometry

Stochastic Trend and Overview of Part I

FIG. 1.3(a) White noise

Incidentally, visual inspection of time-series charts is indispensable to empir-ical studies as it suggests what kind of formal tests should be applied.

1.2 Stochastic Trend as a Logical Implication of an EconomicTheory?3

Let p, be the price of an asset at time, t, and 7 r_i be the information set at t - 1.The rational expectations hypothesis proposes E(p,\It-i) as the expectation ofp, as of time t — 1. The efficient market hypothesis suggests that this expectationis realized so that

Let us assume that (_i contains only (pt-i, Pt-i, • • •)• Then7

(1.7)

(l.T)

3 Yuichi Fukuta, Hiroshi Osano, and Mototsugu Shintani have kindly commented on an earlierdraft of Sections 1.2 and 1.3. Remaining errors are mine.

9

FIG. 1.3(6) Random walk

Page 23: Hatanaka econometry

10 Unit-Root Tests in Univariate Analysis

This is nothing but the definition of {p,} being a martingale process. Notethat (1.7') is equivalent to E(Ap,\p,-i, pt-i,...) = 0, i.e. no part of Ap, ispredictable from (pt~\, pt-i, • • •)•

A number of economic theorists have investigated more closely than theabove whether or not the martingale property of an asset price can be derivedlogically from an economic theory of idealized markets of goods and assets.LeRoy (1973) derives the property assuming that investors are risk-neutral orthe (relative) risk aversion does not vary over time, but his emphasis is ratheron denying the martingale property because investors are risk-averters with atime-varying risk aversion. Lucas (1978) shows that the martingale property ofasset price requires more unrealistic assumptions.4 Finally a random walk isa martingale process, but the martingale process is not restricted to a randomwalk. The stochastic trend (I.Is) is not likely to be derived logically from aneconomic theory alone.

Incidentally the permanent-income hypothesis has been associated with therandom-walk property of consumption in much of the literature. The associationis attributed to Hall (1978), but what he obtains from an economic theory is

where u' is the marginal utility and S and r are respectively the rate of timepreference and the real rate of interest. {u't} is a martingale if r = S. If S > r, {u't}is what is called a sub-martingale process. Since E(Au't+l \u't) > 0 a part of Aw|+1is predictable from u't, which is fundamentally different from the martingaleproperty.5

1.3 Influences upon the Testing of Economic Theories

The stochastic trend has had a profound influence upon empirical studies ofeconomic theories. Logically the influence should follow the empirical plausi-bility about the stochastic trends in many economic variables. The plausibilitywas provided by Nelson and Plosser (1982), using a method developed in Dickeyand Fuller (1981). However, the research does not necessarily advance in alogical sequence. Earlier Davidson et al. (1978) used the error-correction modelto relate the stochastic trend to a study of economic relationships. The followingreview classifies the influences of stochastic trends (i) upon testable implicationsof economic theories, and (ii) on testing for a kind of theoretical property thathas hitherto been taken for granted, i.e. the long-run economic relationship.

4 Sims (1980a) derives the unpredictability of Ap,+i through an approximation that wouldbe justified if the time unit is sufficiently short. Harrison and Kreps (1979) derive the martingaleproperty of option prices from that of stock price.

5 Hayashi (1982) investigates the permanent-income hypothesis without assuming S = r.

Page 24: Hatanaka econometry

Stochastic Trend and Overview of Part I 11

1.3.1 Influences upon testable implications of economic theories

Many economic theories are expressed as the present-value model,

where I, is the information set up to and including t, and the discount factor yis non-stochastic and 1 > y > 0. In the permanent-income hypothesis xt is thelabour income, and yt is that part of consumption that arises from the perma-nent labour income. Consumption responds to a change in the current incomex, only in so far as the latter alters expectation of future income. Supposethat I, contains only (xt,xt-\,...). The new information at t is the innovation,jc( — E(x,\I,~i). Consumers revise the previous expectation E( \It-\) into thecurrent E( |/<) by assessing the impact of the innovation upon future labourincome as explained in Section 2.2 below. This impact is represented by theimpulse-response function, and we have found that the function takes differentforms depending upon whether [xt] evolves like a linear trend plus a stationaryprocess or like a stochastic trend. Flavin (1981) assumed the former (trend-stationary case), and found that observed consumptions respond to revisions ofthe expectation more sensitively than the permanent-income hypothesis (PIH)specifies. Mankiw and Shapiro (1985) and Deaton (1987) examined the latter(difference-stationary case), and concluded that observed responses of consump-tions are too insensitive for the PIH.6

When y, in the present-value model (1.8) is stock price, xt is either the earningor dividend of the stock. Let us write

Then it follows that

Shiller (1981a) intended to test (1.9), which is called the volatility test. The resultwas definitive rejection of the inequality. However, Kleidon (1986) reveals thatthe test is invalid if the relevant time series contain stochastic trends.

Note that the var in (1.9) is the variance in the probability distribution oall events that might occur at a fixed time, t. Kleidon calls the distribution'cross-sectional', and engineers refer to the moments of such a distribution asthe ensemble average (or moment). If we are dealing with an ergodic stationarystochastic process, the ensemble moment can be estimated from the samplemoments of a single realization, such as T~t and 71"1 ~^(x, ~ *)2' m factl ^ xShiller (198la) assumes that the stock price in log scale consists of a linear trendand a stationary process, and reformulates (1.9) so that var() can be estimatedfrom the sample moments. However, when {y,} and {y*} have a stochastic trend

6 See Quah (1990) and Campbell and Deaton (1989) for more about this point.7 When {x,} is Gaussian the conditional variance never exceeds the unconditional variance.

However, it is known that stock price is not Gaussian in terms of a month or shorter time units.

(1.8)

(1.9)

Page 25: Hatanaka econometry

12 Unit-Root Tests in Univariate Analysis

there exists no simple correspondence between var() in (1.9) and the samplemoments.8 Durlauf and Phillips (1988) makes a detailed analysis of this pointby the asymptotic theories that will be explained subsequently, and proves thatthe volatility test may be interpreted as a test of co-integration between {y,} and{>>*} with coefficients 1 and — 1.

Given that a stochastic trend is involved in the real output, it has beenattributed to supply shocks, especially shocks to the productivity growth, inNelson and Plosser (1982), Shapiro and Watson (1988), Blanchard and Quah(1989), and King et al. (1991). An impact of shock at each time is main-tained eternally, and its accumulation produces a stochastic trend. Durlauf (1989)objects to this interpretation on the ground that real outputs of different sectorsare in fact co-integrated. (If the stochastic trends are due to productivity shocks,they are sector-specific, and cannot be co-integrated.) However, it can be arguedthat effects of innovations are interrelated among different sectors. It seems to methat there is no economic interpretation of stochastic trend that is unanimouslyagreed upon at the present time.

Shapiro and Watson (1988) suggest that the presence or absence of a stochastictrend in the real output decides whether the real business cycle theory or theKeynesian theory should be accepted. DeLong and Summers (1988) deny thevalidity of such a link, and develop a theory in which the business cycle isunrelated to the stochastic trend. Then the stochastic trend is included in thetrend, which is the potential growth of output. Whether or not the central lineexists in the real output, real wage, real rate of interest, etc. has a fundamentalinfluence upon macroeconomic theories, but the mode of the influence wouldperhaps depend upon the models.

1.3.2 Long-run relationships

The concept of stochastic trend has developed a new, important role of econo-metrics in investigating the empirical validity of long-run economic relation-ships. This development has been taken in three steps. First, Granger andNewbold (1974, 1986: 207) presented evidence to show that between two inde-pendent stochastic trends the regression coefficient estimate does not converge tozero in probability.9 This confirms an aspect of the well-known nonsense correla-tions between level variables, but it may also be destructive evidence because animportant task of econometrics is to discern the nonsense and the sensible corre-lations. Incidentally, the statisticians' approach represented by Box and Jenkins(1970) was to eliminate the stochastic trends by differencing. The second stepin a study of long-run relationships was taken by Davidson et al. (1978) andHendry and Mizon (1978). They thought that differencing would throw awayan important part of the information about the sensible long-run relationships.

8 To be fair I should say that Shiller (1981a) is a quite elaborate study. The only defect is thatit ignores the fact that the stock price might very well be difference stationary.

9 Later Phillips (1986) proved this mathematically.

Page 26: Hatanaka econometry

Stochastic Trend and Overview of Part I 13

Instead, they proposed the error-correction model as a device to emphasize thelong-run relationships. The model involves current errors in long-run relation-ships in levels, and describes how the errors set forth adjustments in the nextperiod. The third step was by Granger (1981), Granger and Weiss (1983), andEngle and Granger (1987). They developed the theory of co-integration,10 andthereby provided a theoretical foundation to discern the nonsense correlationsand the sensible long-run relationships.

This new econometrics deserves special emphasis. For some years before itsdevelopment econometricians had faced a sort of deadlock. Economic theorieshave very little to suggest on short-run relationships often represented bydistributed lags. They are difficult to ascertain, and the relations do not seemto be stable over different periods. On the other hand, economic theorieshave established a number of hypotheses on long-run relations. Equipped withsimultaneous-equations models econometricians had analysed long-run relationsby simulations over extended horizons, but they lacked appropriate concepts andtools to cope with the stochastic and the deterministic trends.

1.4 Overview of Part I

Part I examines whether or not a stochastic trend is present in economic timeseries, treating each series separately. A large part is devoted to the explanationof various methods, but the results of applications of these methods are also ourmain concern. Chapter 2 sets forth definitions of trend stationarity and differencestationarity in terms of the absence or presence of the long-run component andalso in terms of the dominating root of characteristic polynomials. Methods toexamine the absence or the presence of the long-run component are introducedin Chapter 3. The difficulty with the non-parametric variance ratio approach isexplained by the spectral theory, as it is also found useful to describe difficul-ties in the Schwert moving average (MA) regarding non-parametric estimationsof long-run variance. The absence of the long-run component is equivalent tothe presence of an MA unit root in differenced series. The classical hypothesistesting for trend stationarity against difference stationarity can be performedby testing for the presence of an MA unit root in differenced series. FromNyblom and Makelainen (1983), Nabeya and Tanaka (1988), Tanaka (1990,1995a), Kwiatkowski etal. (1992), and Saikkonen and Luukkonen (1993a), whocontributed to this development, I follow primarily the last mentioned becausethey adopt a parametric approach avoiding non-parametric estimations of long-run variance. Chapters 4-7 form the core of Part I in regard to the developmentof a group of statistical methods. The new asymptotic theories based upon theWiener process have been emphasized and extended for econometric applica-tions in Phillips (1987), Phillips and Perron (1988), and Ouliaris, Park, and

10 Earlier Aoki (1968) and Box and Tiao (1977) had ideas close to the co-integration. SeeCampbell and Shiller (1988) and Aoki (1990) on Aoki (1968).

Page 27: Hatanaka econometry

14 Unit-Root Tests in Univariate Analysis

Phillips (1989). They are introduced without mathematical rigour in Chapter 4in the case where Ax, is i.i.d. with zero mean, and in Chapter 6 in the casewhere the model of {Ax,} is more general. The asymptotic distribution of theMA unit-root test statistic is also presented here. The mathematical results inChapters 4-7 are summarized at the end of Chapter 7. The classical hypothesistesting for difference stationarity against trend stationarity has been formulatedin terms of the dominating characteristic root. The augmented Dickey-Fullermethod is described in fair detail in Chapter 5 for the case where AJC, is i.i.d.,and in Chapter 7 for the case where AJC( is serially correlated. The method treatsdifference stationarity as the null hypothesis, but in my judgement this must becombined with an investigation in which trend stationarity is the null hypothesisas mentioned earlier. Moreover, as Perron (1989) emphasized, the discriminationbetween the difference and the trend stationarity requires appropriate selection ofmodels for deterministic trends. Plausible models are linear trend without a struc-tural change, with structural changes, and polynomial trends. Thus the wholeproblem should be formulated as model selection regarding both the dominatingcharacteristic root and the modes of deterministic trends. The encompassing prin-ciple developed in Mizon (1984) and the general-to-specific principle proposedin Mizon (1977) and Hendry (1979) are found useful in organizing our thinkingfor model selection along the classical, i.e. non-Bayesian method. Some essenceof this thought will be given in Chapter 8. In general the test can discriminatetwo non-nested models only when they are sufficiently separated, and this willbe investigated in Chapter 8 regarding the simplest of difference-stationary andtrend-stationary models. The results of the encompassing analysis of macro-economic time series are shown in Chapter 9. The Bayesian discrimination isintroduced at some length in Chapter 10. Various approaches and their resultsare assessed while emphasizing difficulties due to the point null hypothesis.

Before presenting our conclusions it must be pointed out that with sample size100 characteristic roots between 0.9 and 1.0 cannot be distinguished from a unitroot as will be shown in Chapter 8. With this reservation about the practicalmeaning of difference and trend stationarity our conclusions are given in thefollowing two statements:

1. All kinds of macroeconomic variables, real variables, prices, and financialvariables, are difference stationary in the post-Second World War quarterly datain the USA. Since it is mostly the post-war data that are used for studyingmacroeconomics, the new unit-root econometrics is indispensable to empiricalstudies of macroeconomics. The price and money stock might even be 1(2)rather than 7(1).

2. As for the historical data covering eighty or more years in the USA, itis difficult to offer unequivocal judgement as to whether many real economicvariables are difference stationary or trend stationary. The analysis is hamperedby uncertainty about the selection of modes of deterministic trends and also bydeviations of possibly appropriate models from the standard time-series modelspecification. Unemployment rate and real rate of interest are trend stationary,

Page 28: Hatanaka econometry

Stochastic Trend and Overview of Part I 15

but real GNP and real wage appear to be near a boundary between the trend anddifference stationarity. There is not much that we can propose on the generaltheory of macrodynamics regarding the self-restoring capability of trend line.

The following topics are left out in Part I though they are undoubtedly relatedto the outline mentioned above.

(a) A large body of literature concerning mathematical analyses of the nearunit roots is left out. The analyses contribute to analysing the power of testsfor the difference stationarity and also the size of tests for the trend stationarity,but a rough outlook of the power and the size ,has been obtained by MonteCarlo simulation studies. More important is a possible contribution of the nearunit-root analysis to modelling the reality for which the precise unit root hasbeen tried in vain, for example, the time-varying parameter in the regressionand AR models, the seasonal fluctuations, etc. Not much investigation has beendone to the best of my knowledge.

(b) At the time of writing there are appearing a number of tests for differencestationarity other than the augmented Dickey-Fuller test. I have restricted myexposition to the augmented Dickey-Fuller test because it best fits into theframework of the encompassing principle.

(c) The fractional differencing model provides bridges between 7(0) and 1(1)and between 7(1) and 7(2). It is not introduced in the book because its usefulnessin economic time-series analysis seems yet unclear to me.11

(d) The unit-roots representation of seasonal variations is not introducedbecause I doubt if seasonal variations are properly modelled by the (complex)unit roots.12 Deviations of seasonal fluctuations from the deterministic period-icity are bounded in probability at any time, and the (complex) unit roots areinappropriate to model the deviations.

(e) Beaudry and Koop (1993) find that impulse-response functions of positiveand negative shocks are different, negative shocks being less persistent thanpositive shocks. The finding suggests that some non-linear time-series modelsshould be adopted. The present book does not consider this relatively new field,and readers are referred to Granger and Terasvirta (1993).

11 Seeking for the minimum real number, d, that makes (1 — L^XI a stationary ARMA, xt issaid to have long memory if 0.5 > d > 0. It is stationary, but the impule-response function goesto zero more slowly than a decaying exponential when the horizon extends to infinity (see Grangerand Joyeux (1980) and Hosking (1981)). Diebold, Husted, and Rush (1991) find that a long-memorymodel fits well the annual data of real exchange-rates under the gold standard system. The long-memory model for &xt is a bridge between /(I) and 7(2) for xt. Cheung (1993) finds evidence forit in weekly data of exchange rates and monthly series of ratios of prices among different countries,whereas Lo (1991) finds no evidence in the stock price.

12 Hylleberg et al. (1990) present what seems to be the best analysis of seasonal variation aswell as the testing for seasonal unit roots. Using this test Osborn (1990) and Beaulieu and Miron(1993) showed that the seasonal unit roots were not found in the UK and USA data respectively.However, this result has been objected to by Hylleberg (1994). See a nice survey in Ghysels (1994).

Page 29: Hatanaka econometry

2Trend Stationarity vs. Difference

Stationarity

The present chapter introduces difference Stationarity and trend Stationarity as twonon-nested, non-separate hypotheses. After the concepts are defined in Section 2.1,two parameters are introduced to discriminate the two hypotheses. Perhaps lessknown of the two parameters is a measure of the long-run component, and it willbe explained in great length in Section 2.2. It represents the trend Stationarity as anMA unit-root in A;tr and also as a limit of a sequence of the difference-stationarymodels. The better known is the dominating root of the characteristic polyno-mial for jc, which will be explained in Section 2.3. It represents the differenceStationarity as a limit of a sequence of trend-stationary models. These views inSections 2.2 and 2.3 are combined in Section 2.4. Finally, the data relevant to thediscrimination between the difference and trend Stationarity will be explained inSection 2.5.

2.1 Basic Discrimination

One of the hottest debates in recent years has been trend Stationarity (TS) vs.difference Stationarity (DS) posed in Nelson and Plosser (1982). For a singleeconomic variable, xt, let dt, s,, and c, be respectively the deterministic trend,stochastic trend, and stationary component in xt. The TS hypothesis asserts

whereas the DS hypothesis proposes

where placing d, in parentheses means that the inclusion of d, is allowed butnot required. In fact, deterministic trends are contained in many economic timeseries, and their existence is seldom an issue. The difference between TS andDS is that s, is not in TS but is in DS. In other words the question posed is notwhether trends are deterministic or stochastic but whether the stochastic trendis present or absent in economic time series.

The two hypotheses are differentiated also in regard to what needs to be donein order to achieve the Stationarity. In the TS hypothesis it is subtraction ofa deterministic function of time. In the DS hypothesis the subtraction is notenough, and differencing is required. Incidentally, the Stationarity is a specialcase of the trend Stationarity.

Page 30: Hatanaka econometry

Trend Stationarity vs. Difference Stationarity 17

The precise model of deterministic trend is left unspecified here, and insightfulreaders will anticipate confusions arising in implementing the discriminationbetween DS and TS. We shall defer the discussion to Chapter 8.

The integration order needs to be redefined with the introduction of determin-istic trends, {x,} is /(I) when {A*,} is TS but {x,} is not. {x,} is 1(2) if {A2*,}is TS but {A*,} is not. A quadratic deterministic trend plus a random walk, forexample,

is 7(1), because {A*,} is TS even though it has a linear trend.There are three purposes for discrimination between DS and TS. First,

the discrimination is the purpose by itself for the understanding of economicdynamics. Especially we are concerned with existence of the central line whichthe economy is expected to restore after any temporary deviations from it. Thedeterministic trend cannot be the central line with this property if the time seriesalso contains a stochastic trend along with a deterministic trend. The secondpurpose of the discrimination is a sort of pre-tests necessitated by the fact thatdifferent results of the discrimination lead to different inference procedures insubsequent, main parts of empirical studies. An example is the determination ofintegration orders prior to the co-integration analysis. The third purpose is to seewhich of DS and TS models performs better in outside sample forecasting. Tothe best of my knowledge this type of discrimination has not been implemented,nor is it attempted in the present book.1

Both the first and the second purposes investigate the congruence of DS andTS models with observations. For the first purpose the macroeconomic theoryprescribes what data to analyse, and the discrimination is desired to be as sharpas possible. For the second purpose the data are selected in the main parts of thestudies, and one would be concerned with the specification error regarding howthe subsequent studies are affected by erroneous decisions on the discriminationbetween DS and TS. Effects of erroneous decisions are investigated in Durlaufand Phillips (1988) and Banerjee et al. (1993: 81-93), but a great deal moreneeds to be done.

There exist two methods to distinguish the DS and TS. One is in terms ofthe presence or absence of a long-run component, and the other is whether thedominating root of the characteristic polynomial is equal to or less than unity.

2.2 Long-Run Component

Beveridge and Nelson (1981) have presented the concept of a long-run compo-nent, which is later found indispensable to all the conceptual and technicaldevelopments on the stochastic trend. Given a scalar stochastic process {x,} with

1 Clive Granger has proposed the discrimination in terms of the forecasting capability.

Page 31: Hatanaka econometry

18 Unit-Root Tests in Univariate Analysis

either 7(1) or 7(0), suppose that {A*,} is a linear process with zero mean, i.e.

where {ej is i.i.d. with E(et) = 0 and E(e^) — a*. Here it is assumedthat the process starts at t = —oo. Suppose that we are now at time t. Theoptimal prediction (i.e. minimum mean-square prediction) of AJC(+,(/ > 0) from(xt,xt-i,...) is

where the first equality follows from the fact that the projection of Ajcr+, uponthe space spanned by x,,x,^\,... is identical to that upon the space spanned byE,,et-i,... . The optimal prediction of xt+k(k > 0) from (xt, xt-\,...) is

2.3)

therefore {x,} is a pure random walk unless 52o°^ = 0. A more compactderivation will be introduced later with lag operator, and what XxT^< = 0means will also be explained later.

We shall call x, and Ai, the long-run components of xt and &.xt respectively.The variance of Ajc,, i.e. (Y^ bi) &%, is called the long-run variance. (2.5) indi-cates how the long-run component is revised in the light of the new informationgiven at time, t.

Several remarks will be presented concerning the meaning of xt. (i) As seenfrom (2.2) the impact of e, in (2.1) upon (Ajt<+i, A^;+2, . . .) is (b\, bi, • • .)• Asseen from (2.3) the impact of e, upon (xt+\, x,+2,...) is (b\, b\ +b2,.. .)• So° ^>is the impact upon xt+00, i.e. the limit of the impulse-response function as thhorizon extends to infinity, (ii) Suppose that {jc,} is a stationary linear processo that

(2.4)

Let us consider the prediction of the infinitely remote future by taking k to +00while fixing t.

(2.5)

Do this prediction at each time, t, and form the stochastic process, {x,}. It iseasily seen that

(2.1)

(2.2)

(2.6)

Page 32: Hatanaka econometry

Trend Stationarity vs. Difference Stationarity 19

Then

and in relation to (2.1) bo = CQ, b\ = c\ — CQ, . . . , from which it follows that^^° hi = 0. The long-run component of Axt vanishes. In fact, being a stationarylinear process, no part of x, has a permanent impact upon its future, (iii) UnlessJ^ bj — 0 {x,} and hence {x,} are non-stationary, and include a stochastic trend.In regard to the model (2.1) the TS is £ bt = 0, and the DS is £ bt ± 0.

I have not bothered to consider convergence conditions on infinite seriesinvolved in expressions (2.1)-(2.6). If Ax, is a stationary ARMA (p, q) (auto-regressive moving-average process in order (p, q)),

and if the AR part is inverted to get

then {bj} is bounded by a decaying exponential, by which I mean the existenceof c such that 1 > c > 0 and \bj\ < cj, j — 0, 1 , . . . . This is because the rootsof the equation, 1 + a\z + . . . + apz

p = 0, all have absolute values larger thanunity, or, zp+a\zp~l + •• .+ap — 0 all have absolute values smaller than unity.If {bj} is bounded by a decaying exponential all the infinite series in (2.1)-(2.6)converge. Therefore there should be no worry about the convergence if Ax, isa stationary ARMA (p, q).

The major points in the above derivation can be reproduced with lag operator,L. The initial values will also be introduced to make the reasoning more usefulfor the inference based on a finite amount of data. The right-hand side of (2.1)is b(L)s,, where b(L) = ba + b\L + b2L

2 + Substituting unity for L, Y^o" ^may be written b(l), which is a very common practice in the literature. Thusthe long-run component of Ax, in (2.5) is written b(\)e,. From

we get a useful identity

where

With this identity (2.1) is

This gives

(2.8)

(2.7)

Page 33: Hatanaka econometry

20 Unit-Root Tests in Univariate Analysis

Here the DGP (data-generating process) still starts from t = -oo, but (2.8)has adopted t = 0 as an initial time in order to condition our analysis uponXQ, x-i,..., or BQ, £ _ i , . . . .In fact the random-walk process cannot be analysedunless so conditioned. We might then alter the previous definition of the long-runcomponent of x, slightly. The new definition is

Then (2.8) shows that xt is decomposed into the long-run component,b ( l ) Yl's=i £s'tne short-run component, b*(L)et, and the initial term, XQ—b*(L)so.The initial term is independent of the newly defined long-run component. Theanalysis of the long-run component conditioned upon the initial term is identicalto the unconditioned analysis of the long-run component newly defined.

Incidentally, the old expression of long-run component of x, is (2.4), which is

so that the old and the new expressions differ in regard to XQ — h*(L)so. It stillholds for both expressions that

We might set XQ — 0 to initiate the random walk {x,} in the new definition.Point (ii) given in connection with (2.6) can be rephrased more compactly.

Suppose that {xt} is stationary and that x, = c(L)s, with c(L) = CQ + c\L +Then Ax, = (1 — L)c(L)et so that b(L) = (1 — L)c(L). It is immediately seenthat b(l) = 0 because b(L) has a factor, (1 - L), which vanishes when unity issubstituted for L. The TS is b(l) = 0, and the DS is b ( l ) ^ 0.

The long-run variance of A*, is (£^°fc,-) o\. This may be represented in anumber of expressions.

(i) Recall that the spectral density function of (2.1) is

where i = V — 1 and exp(z'X) = cos A. + /sin A.. (Appendix 1 is available forreaders who want to brush up their memory about the spectral method.) SettingA. = 0 we see that the long-run variance of Ax, is (2n) x (the spectral densityat zero frequency).

(ii) The long-run variance of Ax, is lim^oo t~lE(xf). To see this supposethat x0 = 0. Then x, = Ax, + Ax,_i + ... + Axi, and rlE(xf) =t~{E (££=1 ^=1 AX,AJC,) = YO + rl(t - i)2Xl + rl(t - 2)2Y2, + ...+t~l2y,-i, where Yj = E(Ax,A.x,-j), i.e. the autocovariance of Ax, for lag j.Using the Cesaro sum it can be shown that, as t —>• oo, yo + t~l (t — l)2yi +1'1

(t — 2) 2)/2, + ... converges to yo + 2y\ + 2y2, +..., which is (2n) x (thespectral density at zero frequency). Finally, the assumption, XQ = 0, has noeffect upon the above result since XQ is independent of e,, t > 0.

Page 34: Hatanaka econometry

Incidentally the long-run variance, i.e. the variance of Ax,, is equal to thevariance of Ax, if and only if ̂ bj — (£) b^ . Among a number of processesin which this condition holds, the i.i.d. process is the most important one.

Later in Chapter 3 a special case of (2.6) will play an important role. Supposethat {x,} is a moving-average process in order q, MA(q),

which is of course stationary. Then

where The equation

has a unit root, which is called the MA unit root. Recall that for the generalMA(q + 1) model the admissible domain of (bo, b\,..., bq, bq+\) is restrictedto those for which no roots of (2.12) are less than unity in moduli. This isto avoid the lack of identifiability. Also recall that the standard time-seriesanalysis restricts the domain further to those for which all the roots exceedunity in moduli, i.e. what is called the invertibility region. The MA unit rootlies on the boundary of the admissible region, which is in fact a point of thenon-investibility region. In the model (2.6') the TS is that (2.12) has a real unitroot, and the DS is that all roots of (2.12) exceed unity in moduli.

It might be advisable to say that {Ax,} is /(-I) in order to emphasize thepoint that b(l) for Ax, is zero if {x,} is stationary. This has been suggested inGranger (1986).

One can introduce a simplest type of deterministic trend in {x,} by adding /j,to the right-hand side of (2.1),

(2.8) is now replaced by

(2.8')

Trend Stationarity vs. Difference Stationarity 21

(iii) The important relation

can be directly proved on the model (2.1). First,

(2.10)

(2.9)

(2.6)

(2.111)

(2.12)

(2.1)

Page 35: Hatanaka econometry

22 Unit-Root Tests in Univariate Analysis

2.3 Dominating Root of Characteristic Polynomial

Another parameter useful for differentiating TS and DS is the dominating rootof the characteristic polynomial. I begin with different expressions being usedin the economic dynamics and time-series analysis.

In the economic dynamics a homogeneous, linear difference equation

is said to be stable if all roots of the corresponding characteristic polynomial

are less than unity in moduli, and explosive if at least one root of (2.14) exceedsunity in moduli. Also the root is said to be stable or explosive. The case wherereal or complex unity is the root largest in moduli is the borderline between thestability and the explosion.

In time-series analysis an autoregressive process with a possibly infinite orderis written as

Define /(z) = 1 - a\z — ..., where z is complex-valued. It is often assumedthat f ( z ) — 0 never occurs when z is less than unity in moduli. If the orderof the autoregressive process is finite, say, p, this assumption means that theequation /(z) = 1 — a\z - • • • - apz

p = 0 has no roots less than unity inmoduli, and it corresponds to absence of explosive roots in the characteristicpolynomial (2.14). In the Box-Jenkins modelling of time series f ( z ) = \-a\z~. . . is not zero also when z is complex with unit modulus. In the AR(p) thiscorresponds to absence of complex unit roots in the characteristic polynomial.In the Box-Jenkins modelling /(z) may be zero at z = real unity. Then withsome positive integer r

where /(z) = 1 — a\z — ... cannot vanish when z is less than or equal tounity in moduli. In the AR(p) this corresponds to the presence of real unit rootwith multiplicity r and stability in regard to all other roots in the characteristicpolynomial.

In time-series analysis the stationary autoregressive process with a possiblyinfinite order represented by the condition that /(z) = 1 — a\z — . . . = 0never occurs in so far as z is equal to or less than unity in moduli. Thenthere exists g(z) = 1 + gi + z - . • • such that /(z)g(z) = 1, i.e. /(z) can beinverted.2 Moreover, g(z) — 0 never occurs when z is equal to or less thanunity, and g(z) is invertible. In the finite-order autoregressive process /(z) =1 — fliz . . . — apz

p = 0 has no roots less than or equal to unity. Then (2.14)

2 A theorem on the analytic function is behind the statement.

92.13)

(2.14)

(2.15)

Page 36: Hatanaka econometry

Trend Stationarity vs. Difference Stationarity 23

has only stable roots, and /(z) can be inverted, i.e. the difference equation

can be solved to get

Here {bj} is bounded by a decaying exponential. (This property does not neces-sarily hold in an infinite order AR.)

In Part I I use the characteristic polynomial (2.13) in order to use the term,dominating root. (I shall switch to the time-series notation in Part II.) The domi-nating root of the characteristic polynomial is the root that is the largest inmoduli. Thus the stationary autoregressive process is characterized by the domi-nating root of the characteristic polynomial being less than unity in moduli, andthe stochastic trend is represented by the dominating root being equal to realunity possibly with some multiplicity. The Box-Jenkins modelling rules out thedominating root being explosive or complex unity.

The explosive root is taken into consideration in Bayesian studies as describedin Chapter 10, but it is a priori ruled out in the classical inference explained inChapters 3, 5, 7, and 8. It should be emphasized that the difference Stationarityprecludes an explosive root. To see this consider xt — pxt-\ = £t with p > 1.Then &x, = st - (1 - p)(e,-\ + pet~2 + . . . + p'~2s\) + p'x0, which is non-stationary. For any positive i the z'th difference of x, is non-stationary, i.e. {xt}is not 7(0 for any i.3

The multiplicity of unit roots referred to earlier by r is assumed to be 1 inmost of the book. In so far as macroeconomic time series are concerned theintegration order of the log of price has been an issue in the literature. Hall(1986) judges that the UK price is 1(2), but Clements and Mizon (1991) andJohansen and Juselius (1992) think that it is 7(1) with structural changes inthe deterministic trends or non-stationarity in the variance. I shall present myresults on logs of price and money stock in Chapter 9. Charts (of the residualsafter the trend fitting) of 7(2) variables show even smoother contour than thecase of 7(1) variables. If one feels from the chart or otherwise a need to checkwhether or not r = 2, one should see if the series of Axt has a unit root. 7(2)can be tested against 7(1) by applying the regression tests given in Chapters 5and 7 to {Axt} and 7(1) can be tested against 7(2) by applying the MA unit roottest in Chapter 3 to {A*,}. The order of integration may not be constant overtime in some variables. We cannot be sure of this because the determination ofthe integration order requires some length of time-span as will be explained inSection 2.5 below.

3 A simple version of the economic theory of bubbles entails an explosive root. An empiricalstudy of Diba and Grossman (1988) on the historical, annual data of stock price (relative to thegeneral price index) casts doubt on the presence of an explosive root. An extended version of thetheory of bubbles given in Froot and Obstfeld (1991) does not necessarily imply an explosive root.It is a non linear model.

Page 37: Hatanaka econometry

24 Unit-Root Tests in Univariate Analysis

The dominating root of the characteristic polynomial is related to b(\) asfollows. Suppose that {x,} is AR(p)

If the factorization

is possible, then

If the factorization (2.16) is not possible, b(l) = 0.

2.4 Non-Separate Hypotheses

In general two hypotheses, HI and H2, are said to be non-nested if neitherH\ ^ H2 nor 7/2 2 #1 where A^> B means that A is more general than B. Thetwo hypotheses are said to be separate if no models in HI can be represented as alimit in some sequence of models in H2 and vice versa. See Cox (1961) for thenon-nested and separate hypotheses. The trend stationarity and the differencestationarity are non-nested but not separate. In Figure 2.1 an arbitrary smallneighbourhood of b(\) = 0 (i.e. TS) contains models of DS. Figure 2.2 is acomplex plane for the dominating root of the characteristic polynomial, p, andthe circle is the unit circle. According to the TS p lies inside the unit circle,while p is real unity according to the DS. In Figure 2.2 an arbitrary smallneighbourhood of p = \ (i.e. DS) contains models of TS.

It is also worth noting that the parametrization in terms of b(\) representsa TS model as a limit of a sequence of DS models, while the parametrizationin terms of p represents a DS model as a limit of a sequence of TS models.It will be found later that b ( l ) is useful to test for TS against DS, whereasthe dominating root of the characteristic polynomial is useful to test for DSagainst TS.

It is not unusual at all that econometricians wish to compare the congruenceof two non-nested, non-separate hypotheses with available observations. Thestatistical inference, either classic or Bayesian, has been useful for the compar-ison if parameter values of the highest concern are sufficiently separated, and

FIG. 2.1 Long-run component

(2.160

and if is stable,

Page 38: Hatanaka econometry

Trend Stationarity vs. Difference Stationarity 25

if available data are well behaved. In relation to the discrimination betweenDS and TS the problem is investigated in Chapter 8. At this point I only warnagainst overoptimism about precision in discriminating TS and DS.

2.5 Time Aggregation and Other Remarks on the Data

Earlier the purpose of the discrimination between DS and TS was classifiedinto three. Past studies with the first purpose have established a stylized patternregarding the time-series data to be analysed, and the data are classified into twogroups. The first is a set of annual data covering 70-130 years for a number ofmacroeconomic variables in the USA, and the annual data for at least real outputand a few other variables in a number of other countries. As for US data readersare referred to Nelson and Plosser (1982) and Schotman and van Dijk (1991b)in which the Nelson-Plosser data have been updated to 1988. Kormendi andMeguire (1990) explore the data in countries other than the USA. The secondgroup is the post-Second World War quarterly or monthly data for a large numberof variables in all advanced countries, which must be well known to econo-metricians. Many studies have been carried out on international comparisons onthe real outputs. In the present book the first group will be referred to as thehistorical data, and the second group as the post-war data.

In reviewing each empirical study it is advisable to pay attention to whichof the two groups is being analysed. Results of analyses are often differentbetween the two groups of data. The post-war era is different from the pre-warwith respect to a number of aspects of economic regimes, and this is seen mostclearly in the variance of Ax, being larger in the pre-war period than post-war, where x, is the real output in log. See also Harvey (1985), DeLong andSummers (1988), and Durlauf (1989). Romer (1986a, 1986fc, 1989) question thecomparability of US data between the pre- and post-Second World War periods.Even though she is primarily concerned with the volatility her problem mustalso be relevant to the stochastic trend.

In spite of these remarks it must be admitted that historical data covering longtime-periods are preferred when the presence or absence of stochastic trends isour direct concern as a property of economic dynamics. This is because the

FIG. 2.2 Dominating wots of characteristic polynomial

Page 39: Hatanaka econometry

26 Unit-Root Tests in Univariate Analysis

persistence of responses to a current impulse even in the infinitely remote futurecan be verified only with time series covering very long periods.

The post-war data are available on a quarterly basis, which of course can beaggregated into annual data if so desired. On the other hand the historical data areavailable on an annual basis only. Let us consider the discrimination betweenDS and TS in relation to different units of time. In terms of the dominatingroot of the characteristic polynomial the unit root remains unchanged after timeaggregation, but the stable root varies its value with time aggregation. Supposethat x\ = px,-\ + st, 1 > \p\ > 0, on a quarterly basis, where {e,} is i.i.d. with

zero mean. Let yt = Y?i=oxt-i and et = ]u=oe<-<'-Then

Picking yt at every fourth time produces the annual data. Even though it is notAR(1) but ARMA(1,1), the root of the characteristic polynomial for the ARpart is p4. A DS model with p — 1 and a TS model with, say, p = 0.95 on aquarterly basis are separated apart more distinctly as p = 1 vs. p = 0.954 onan annual basis. Turning to the size of the long-run component let us supposethat A*, is MA(<y). Construct Ay, = Y^i=o &xt-i and pick up every fourthobservation of Ay,, which produces the annual data of the differenced series. Itis MA (1 + [(q — l)/4]). Now suppose that {x,} is TS on the original, quarterlybasis so that b(\) = 0 in {A*,}. Then fo(l) = 0 in {Ay,} but generally b ( l ) isnot zero in the series of every fourth term of Ay,, because zero frequency andthe frequency that corresponds to the annual cycle cannot be distinguished inthe annual data.4 Thus what is TS in the quarterly data is turned into DS in theannual data.5 Seasonal adjustments do reduce, but do not eliminate the problem.In fact after the seasonal adjustments it is conjectured that the spectral densitywould resemble that of the Schwert MA to be explained in Section 7.2 below.Incidentally, what is DS in the quarterly data remains as DS in the annual datain terms of the size of long-run component.

In later sections the testing for DS against TS is performed in terms of thedominating root of the characteristic polynomial, and the testing for TS againstDS in terms of the size of the long-run component. If the DGP is TS in thequarterly data, the time aggregation separates more clearly the dominating rootof the characteristic polynomial from unity, but at the same time it reduces

4 Let /(A.), n > A > —n, be the spectral density function of Ax, so that /(O) = 0. Let g(A) =j|l+exp(!'A) + exp(2z'A)+exp(3r'A)||2. Then the spectral density function of Ay, is h(X) — /(A)g(A.).Note that h(0) = 0. The spectral density function of the scries picking every fourth term of Ay( is

See, e.g. Fuller (1976: 119). Generally the density at A = 0 is not zero because h(n/2) is notzero, jr/2 is the frequency that corresponds to the annual cycle.

5 If 4 were an odd number, there would be a simple method to make h(n/2) zero and therebyb(1) = 0 in the annual data.

Page 40: Hatanaka econometry

Trend Stationarity vs. Difference Stationarity 27

the sample size. Perron (1989&, 1991a) think that the power of discriminationbetween DS and TS depends upon the absolute length, say in years, of theperiod covered by the time-series data but not upon the units of time in the data.Continuing on the same assumption, the time aggregation introduces a modelmisspecification in terms of the size of the long-run component, and distorts thetesting for TS against DS. Weighing these factors it may be concluded that weshould use the quarterly data without time aggregation when both the annualand quarterly data are available.

In Chapter 9 we shall analyse both the historical annual data and the post-war quarterly data. The former may involve a model misspecification in termsof \b(\)\ but it is outweighed by the advantage due to the longer time-spansthat they cover. The historical data seem to be useful for any suggestions onthe general theories of macrodynamics. On the other hand the post-war data areused more frequently for econometric studies in general than the historical data.The discrimination between DS and TS in the post-war data is important fromthe standpoint of the second purpose of the discrimination, i.e. the analyses ofinference procedures.

Macroeconomic time-series data used for DS vs. TS discrimination are logar-ithms of the variables except in the case of interest rates and unemployment rates.Most of the original variables appear to be explosive. Repeated differencingoperations do not transform them into the seemingly stationary series, but thelogarithmic transformation possibly does with differencing. See Banerjee et al.(1993: 192-9).

In both the historical and the post-war data a typical sample size is 100, andT = 100 has been adopted in most of simulation studies regarding finite sampledistributions of various statistics. In the post-war data T is close to or more than150, but the discrepancy between the asymptotic and finite sample distributionsis not much affected by increasing T from 100 to 150.

Page 41: Hatanaka econometry

3

Discrimination in Terms of the

Long-Run Component: A Test forTrend Stationarity

The present chapter deals with a measure of long-run component, £>(!).Section 3.1 describes the non-parametric variance ratio that has been widelyused to measure the importance of long-run components in economic timeseries. While admitting its usefulness as a descriptive statistic, I shall arguein Section 3.2 that it cannot be used to discriminate the difference and thetrend Stationarity. My argument uses the spectral theory, and the theory will beused again in Chapter 7 to explain the Schwert ARMA. Therefore Appendix 1provides an elementary description of the spectral theory. Section 3.3 offers abrief comment on the time-series decomposition. Section 3.4 explains the MAunit-root test that has been derived as the locally best test on the basis of theinvariance principle. Since this general inference theory is relatively new toeconometricians, elementary explanations are given in Section 3.4. It turns outthat the MA unit-root test for A*, is useful to test for the trend Stationarity ofx, against difference Stationarity. I shall follow the latest version on this line,Saikkonen and Luukkonen (1993a), and also present a particular implementationof the method that will be used in Chapter 8 for an experiment and in Chapter 9for my analysis of economic time-series data. The deterministic trend is just aconstant in the present section, but will be extended in Chapter 6.

3.1 Non-parametric Variance Ratios and Mean-Reverting

A virtually identical type of analysis was made about the same time by Huizinga(1987) on monthly real exchange rates, by Cochrane (1988) on annual US realGNP, and by Poterba and Summers (1988) on stock price and related variablesin various time units.1 All these studies attempt to get information about b(l)in a non-parametric framework though they are not necessarily directed to thediscrimination between DS and TS.

I use the notations in Chapter 2 and the model (2.1'). For a given k > 0

1 Campbell and Mankiw (1987), Fama and French (1988), and Lo and MacKinley (1989) arealso related to this research.

Page 42: Hatanaka econometry

where }/, is the autocovariance of {Ajc(} for lag j. It is said that in manyeconomic time-series y/ is positive for small j but negative for large j, eventhough absolute values of YJ are small except for j = 1 or 2. In such seriesrk is larger than unity for small k because vi = YQ, but, as k increases, rk

gradually declines because negative y7-s begin to be included in vk. The case,r < 1, is called mean-reverting or trend-reverting, and r > 1 is called mean(trend)-averting. The trend stationarity is the extreme case of mean-reverting.

Finite amounts of data provide no evidence on the limit of rk as k —> oo. Weshould be content with a hint given by r\,..., rkte, where k* is a cut-off pointchosen to maintain the reliability of inference from a given length of availabledata, T. Since no parametric models are adopted here, I shall call ( r ^ , . . . , r^)the non-parametric variance ratios.

The VfrS are estimated by sample analogues of (3.1) except for degrees offreedom adjustment, for which readers are referred to Cochrane (1988). Theestimates of (r2,..., rkif) have been plotted against k in a two-dimensionalgraph, and confidence bands are set by various methods. A standard simulationmethod is used in Campbell and Mankiw (1987, 1989), Cochrane (1988), Loand MacKinlay (1989), Christiano and Eichenbaum (1990), and Kormendi andMequire (1990), and a x2 (instead of normal) approximation to spectral estimatesis used in Cogley (1990). Confidence bands are not easy to construct because anegative bias and skewness are found.

Test for Trend Stationarity 29

and

Define

Since ^Jf1 bf + 26(1) Eo"' b* + EcT^hi ~ b*^) remains bounded ask -> oo, lim^oo v/t = b(l)2a^.

Consider rk = vk/v\, k = 2,3,... to turn vk into a unitless number like acorrelation coefficient. Since E(Axt+i — /u,)2 = a2 ^°10&2,

If {x,} is TS, b(\) = 0, and r = 0. If {x,} is DS, r > 0.An alternative view of vk is

(3.2)

(3.1)

(3.3)

Page 43: Hatanaka econometry

30 Unit-Root Tests in Univariate Analysis

The studies analysing the non-parametric variance ratios are primarilyconcerned with discriminating the mean-reverting and mean-averting. Thoughnot entirely irrelevant to our goal of discriminating the TS and DS, my surveyof the results will be brief. As for the stock price Poterba and Summers (1988)and Fama and French (1988) conclude that the mean-reverting is supported bymonthly and annual data in a number of countries, but Kim, Nelson, and Startz(1991) conclude that the mean-averting is observed in monthly data of the NewYork Stock Exchange since the Second World War. As for the historical data ofreal output Cochrane (1988) gets evidence in favour of the mean-reverting forthe USA, but the international comparisons in Campbell and Mankiw (1989) onthe post-war quarterly data and in Cogley (1990) and Kormendi and Meguire(1990) on the historical data find that the USA is an exception among a numberof countries examined.

A linear trend has been taken into consideration in (3.1), but any structuralchanges in deterministic trends have not been. Banerjee, Lumsdaine, and Stock(1992) observe that the countries for which the real output is found mean-averting are also those for which the structural changes are significant. Demeryand Duck (1992) also makes a similar observation. The role that the structuralchanges play in discrimination between the TS and DS will be discussed inChapter 8.

3.2 Difficulty of Discrimination through the Non-ParametricVariance Ratios2

One might think that the non-parametric variance ratios may be useful also fordiscriminating TS and DS. If the confidence band does not reach the zero line inthe two-dimensional graph of ( r2 , . . . , rka t ) p lo t ted aga ins t k , we should acceptthe difference stationarity while admitting the limitation of evidence to those ksless than &». In fact there have been such attempts.

I shall argue that, when the sequence of v^ is truncated at k — k*, v^ cantake any values whatsoever within the models of TS so that the above confi-dence judgement lacks a theoretical ground. This argument provides a theoreticalsupport to the scepticism that some earlier simulation studies have had aboutthe confidence judgement.

The starting-point of my argument is (3.3). I shall write k for it* below tosimplify the notations. Define

(3.4)

(3-3')

Then (3.3) is

2 I have benefited from comments by Kimio Morimune on an earlier draft of Section 3.2.

Page 44: Hatanaka econometry

Test for Trend Stationarity 31

where {YJ} is the autocovariance sequence of Axt. Keeping k fixed, the Fouriertransform of WJ^YJ, j = — k + 1 , . . . , 0 , . . . , k - 1, is

(3.5)

as shown, for example, in Anderson (1971: 508-9). Let /A(A.) be the spectraldensity function of {A*,}. Then /A(A, k) is related to /A(A) through

(3.5')

which is proved in Appendix 1. The right-hand side of (3.5') is an averageof /A(£) over different values of £ with weighting function hB(k — f, k) (asa function of f). As seen in Figure 3.1 fcfl(A. - f, £) is highest at A. — £ = 0.Therefore the weights are highest at £ = A. In particular, when A. = 0

(3.7)

In this averaging of /A(£) the heaviest weight is placed at £ = 0. We areconcerned with (3.7).

In general the Fourier transform of weighted autocovariances (such asWjjYj< j — -k + 1 , . . . , 0 , . . . , k - 1) is called the spectral density 'lookedthrough a window' because it is a weighted average of original spectral densitiesas shown in (3.5'). The weighting function on the time domain such as (3.4)is called the lag window, and the weighting function on the frequency domain

FlG. 3.1 Bartlett spectral window function

Then (3.3') is 2nf&(0, k), and this is what we are here concerned with. We shallre-express 2^/A(0, k). The Fourier transform of (3.4) is

(3.6)

Page 45: Hatanaka econometry

32 Unit-Root Tests in Univariate Analysis

such as (3.6) is called the spectral window. Specifically, when the weights onautocovariances are (3.4), the expressions (3.4) and (3.6) are called respectivelythe Bartlett lag window and Bartlett spectral window. The v/t in (3.7) is(2jr) times (the spectral density at zero frequency looked through the Bartlettwindow). So much has been pointed out in Cochrane (1988), Campbell andMankiw (1989), and Cogley (1990).

Given the spectral density of {Ax,}, the Vk can be calculated theoretically from(3.7). Figure 3.1 plots hB(k, k) in (3.6) against A with k fixed. The base of thecentral 'hill' narrows as k increases. Now I suppose that {xt} is trend stationary,and after subtracting its deterministic trend the spectral density function is g(A).Then {Axr} has the spectral density function, 2(l-cos A)g(A) = /A(A.), as shownin Appendix 1. Note that /A(O) = 0 no matter what g(-) is, which reconfirmsthe implication of b(l) = 0 given earlier in Section 2.2. The expression (3.7)becomes

(3-8)

For illustration suppose that [x,] is generated by

where {s,} is i.i.d. with E(et) — 0 and E(s-f) = 1. Thus

The graph of /A(A) with this g(-) is shown in Figure 3.2. The integrand in (3.8)is the product of two functions, hB in Figure 3.1 and /A in Figure 3.2. For agiven k = k*, we can choose p sufficiently close to unity so that the valleycentred at A. = 0 in Figure 3.2 has sufficiently steep sides. This in turn makesthe product of two functions sufficiently larger than zero over some domain ofA. as shown in Figure 3.3. Therefore the integral in (3.8) is positive rather thanzero. The TS model (3.9) looks like a DS model.

FIG. 3.2 Spectral density function of Ax,

(3.6)9

Page 46: Hatanaka econometry

Test for Trend Stationarity 33

3.3 Time-Series Decomposition

A given time series can be decomposed into unobservable components eachrepresenting a specific time-series property with a parametric model. The mostcomprehensive literature is Harvey (1989). In relation to the discriminationbetween the trend and the difference Stationarity relevant components are(i) the long-run or permanent component and (ii) business cycles or transitorycomponent. Clark (1987) and Watson (1986) demonstrate that there is anumber of different decompositions with different correlations between the twocomponents. In response Cochrane (1988) observes that the variance of the long-run component does not depend upon how the decomposition is made. Thisis because, no matter how a given time series is decomposed, the transitorycomponent does not contribute at all to the spectral density of Ax, at zerofrequency. This theoretical observation notwithstanding, statistical estimates ofvariances of long-run components diverge among different decompositions. Thedecomposition in Clark (1989) produces a result in favour of TS contrary toCampbell and Mankiw (1989) concerning the real GNP in different countries.3

3 Lippi and Reichlin (1992) present an interesting inequality concerning the decomposition intounobservable components.

If we have infinite amount of data so that k* can be made oo, then hs(k, oo)is the Dirac delta function concentrating on A. = 0. This makes Voo zero in thecase of the trend Stationarity. This is consistent with the condition for TS in(3.2), i.e. r = 0.

It is admitted that, no matter what methods are used, the power to discriminateDS and TS with T — 100 is limited, but in my view the non-parametric varianceratios are not useful unless one has a priori belief in the difference Stationarity.To be able to perform a reliable statistical inference on b ( l ) one has to assumein addition that /A (A,) for Ax, is smooth about A, = 0. The above point is closelyrelated to the long-run variance in the Schwert ARMA, which will be discussedin Sections 7.2 and 7.3.3.

FIG. 3.3 Product of window and spectral density

Page 47: Hatanaka econometry

34 Unit-Root Tests in Univariate Analysis

3.4 Parametric MA Unit-Root Test: A Test for Trend

Stationarity against Difference Stationarity

Through a simulation study Rudebusch (1992) demonstrates an enormousdiscrepancy between the impulse-response functions of DS and TS models evenwhen these models can hardly be discriminated in terms of the standard modelfitting to the available data. It seems that (bo, b\,...) in Chapter 2 is a gooindicator for the discrimination between TS and DS. However, non-parametricapproaches to b(\) are difficult with T = 100 in the available macroeconomictime-series data as shown in Section 3.2. We are thus led to trying a modelwith a finite number of parameters, i.e. a parametric approach. As indicated inequation (2.11), if [xt] is a stationary MA(q) devoid of a long-run component,i.e. b ( l ) = 0, then {Axr} is an MA (q + 1) with an MA unit root. We are thusled to testing for the presence of an MA unit root.

This testing problem is non-standard in so far as the maximum-likelihoodapproach is adopted, because the null hypothesis specifies a point on theboundary of the admissible region of MA parameters. The maximum-likelihoodapproach should not be recommended (unless the boundary is properlydealt with).

3.4.1 BLI test

An inference theory called the invariance principle has seldom been used ineconometrics, but it is this theory that has made the MA unit-root test possible.A general explanation of the invariance principle now follows. The observation,x, is produced by a member of a parametric family of models with a probabilitydistribution represented by a value of a parameter, 9. Consider a group of trans-formations, G, on the observation and an associated group of transformations,G, on the parameter, such that the distribution of Gx with 9 is identical to that ofx with G9. (For example, x ~ N(fi, 1), G transforms x to x + c with some non-stochastic c, and G transforms /x to /^ + c.) We are confronted with an inferenceproblem, testing for H0 against H\, and suppose that the differentiation betweenHO and HI remains unaltered through the transformations in G, symbolicallyG(//o) = HQ, G(H\) = H I . Then the invariance principle asserts that the testfor HO against H\ should be based upon what is called the maximal-invariantstatistic. A statistic, T(x), is maximal-invariant for a given inference problem ifthe following two conditions are met with G, which is the maximal-invariantgroup of transformations for the given inference problem: (i) T(x) = T(g(x))for all x and all g e G, and (ii) if there is no g e G such that x' = g(x), thenTV) ^ TV). See Cox and Hinkley (1972: 41-4, 157-61, 165) and Ferguson(1967: 143-52, 242-5).

Another inference theory that has been introduced in the MA unit-root test isthe locally best (LB) test. Suppose that HQ states that 6 = 9° and H i states that

Page 48: Hatanaka econometry

Test for Trend Stationarity 35

9 > 9°. From among all possible ways to construct critical regions, each havinggiven test size a, the theory of the locally best test proposes to choose the onesuch that the slope of the power function from 9° to 9° + s (E being positive andinfinitesimal) is the steepest. Suppose that f ( x , 9) is the p.d.f. of x. Then it isseen by a reasoning analogous to the Neyman-Pearson lemma that the steepestslope is attained when the critical region consists of those x such that

The left-hand side of (3.10) denotes the partial derivative with respect to 9that is evaluated at 9 — 6°, and k is chosen to obtain the given test size.See Ferguson (1967: 235-6). If the left-hand side of (3.10) is independent of0°, the idea is extended to the curvature (instead of the slope) of the powerfunction, giving the locally best unbiased (LBU) test. It introduces the second-order derivative of log /(•) in the criterion to construct the critical region(see Ferguson (1967: 237-8)). These tests may be interpreted as an extendedversion of the score test with explicit recognition of the boundary condition (seeTanaka (1995a)).

The invariance principle and the locally best test (or the locally best un-biased test) can be combined into the locally best invariance (LBI) test (or thelocally best invariance unbiased (LBIU) test) (see Ferguson (1967: 246)). Ithas been implemented on the regression model with a disturbance covariancematrix, <r2Q((9), where Q(0) = /-/. The inference problem is to test HQ : 9 — 0against H\ : 9 ^ 0, or 9 > 0. The relevant literature is Durbin and Watson(1971), Kariya (1980), King (1980), and King and Hillier (1985). The LBItest has also been implemented on the regression model with a time-varyingparameter. The inference problem is to test HQ\ (no variation in the parameter)against H\: (a random walk is involved in the parameter). The literature isLaMotte and McWhoter (1978), Nyblom and Makelainen (1983), and Nabeyaand Tanaka (1988).

3.4.2 MA unit-root test

Nyblom (1986) and Kwiatkowski et al. (1992) recognize that the testing for thestationarity of {x,} against the non-stationarity is a special case of the testingmentioned above in relation to the regression with a time-varying parameter. Thetesting is found invariant through linear transformations of observations (locationand scale changes). It is in this background that the MA unit-root test has beendeveloped in Tanaka (1990, 1995a), Kwiatkowski et al. (1992), and Saikkonenand Luukkonen (1993a). The testing methods that they propose are equivalent toeach other in so far as { AJC,} is MA(1), i.e. q — 0 in (2.11).4 All of them extend

4 The equivalence between Kwiatkowski et al. (1992) and Saikkonen and Luukkonen (1993<z)is apparent. The equivalence between Tanaka (1990) and Saikkonen and Luukkonen (1993a) isindicated in Saikkonen and Luukkonen (1993ft). I am also indebted to Katsuto Tanaka on this point.

(3.10)

Page 49: Hatanaka econometry

If S = 1, xt = OQ + s,. If |5| < 1, {xt} is difference stationary. Let jc' =(*!,..., .XT-), e' = (1 , . . . , 1), and L* be the T x T matrix with unities on thediagonal immediately below the main diagonal and zeros elsewhere. Also letD = L* + L*2 + ... + L*T-\ and v = x - xe, where x = T~le'x. Then theLBIU test for S = 1 against S < 1 is based upon

The critical region is set on those S that are larger than a certain value, whichis to be decided by the asymptotic distribution of T~}S so as to secure a giventest size.

Since e'v = 0 and / + D = ee' - D' (3.12) is equivalent to

(7 + D)v is a vector of partial sums of v, (accumulating from t — 1), and theinituitive meaning of the test statistic is clearer in (3.12') than in (3.12). Thelimiting distribution of T~}S will be given in Section 6.3.

Let us generalize (3.11 a) and (3. l i fe) to

where L is the lag operator. If S = \,x, = OQ + ut, which is stationary.Let M' = (UI,...,UT) and E(uu') = &^, where S is a function of y =(a\, ..., ap, b\, ..., bq) so that we write S(x). When 5 = 1 and y is known,(e''S(Y)~le)~l e"E(y)~lx is the GLS estimator of 60, which we write 00(l, y).

36 Unit-Root Tests in Univariate Analysis

their methods to the more general case, and the extension in Saikkonen andLuukkonen (1993a) differs from that in Tanaka (1990) and Kwiatkowski et al.(1992). The latter involves non-parametric estimations of long-run variances, thedifficulties of which are explained in Section 7.3.3 below. On the other handSaikkonen and Luukkonen (1993a) adopt an ARMA representation of {Ax,}avoiding non-parametric estimations of long-run variances.

I am now ready to introduce the MA unit-root test by Saikkonen andLuukkonen (1993a). Let {s,} be Gaussian i.i.d. with E(st) = 0 and E(s2t) = a^,and let {A*,} be an MA(1),

For the initial value it is assumed that

It follows that

(3.11a)

(3.11)

(3.12)

(3.12)

(3.13a)

(3.13b)

(3.13c)

Page 50: Hatanaka econometry

Test for Trend Stationarity 37

Let \(y) = x — 6*o(l, y)e. If y is known, the LBIU test for 8 = 1 against 8 < 1is based on

When y is not known Saikkonen and Luukkonen (1993a) suggests to estimateit by fitting

to {x,}. The domain of parameters is that (i) both 1 - a\L - ... — apLp and

\ — b\L — ... — bqLq are invertible and (ii) —1 < S < 1. Note that a possibly

non-invertible MA is introduced through (1 —SL). The y that results is consistentunder both the null and the alternative hypotheses, which in turn assures theconsistency of the test based on (3.14'). The y replaces y in (3.14').

I introduce one simplification. It is known that if {x,} is TS the asymptoticdistributions of GLS and OLS of the mean (or more generally, coefficients in apolynomial trend) are identical.5 Therefore without altering the asymptotic nulldistribution the GLS, 90(l, y), may be replaced by the OLS, T~le'x.

When the deterministic trend is generalized from a constant, f?o, to a polyno-mial trend, all we have to do is to form v (y) by subtracting the OLS estimate ofthe trend. However, the asymptotic distribution of T~lS(y) has to be adjustedfor the trend subtraction, which will be shown in Section 6.3.

The US historical data have been analysed in Kwiatkowski et al. (1992)by a method analogous to but different from (3.14').6 The method by (3.14')will be applied in Chapter 9 to the US historical and post-war data after someinvestigation about modes of deterministic trends.

I would like to present here some experiences obtained in Hatanaka and Koto(1994) on the estimation of (a\,..., ap, b\,..., bq, S) in (3.15). Saikkonen andLuukkonen (1993a) suggests (a) to maximize the likelihood function for theARMA (p,q+l) model for AJC, and (b) to let 8 be the largest dominating rootof the MA (q + 1) on the right-hand side. The (b) assumes that the dominatingroot is real.7 The (a) is not necessarily convenient because the domain of 8 is

5 This is due to a theorem in Grenander and Rosenblatt (1957), for which a good exposition isfound in Anderson (1971, ch. 10).

6 See also Leybourne and McCabe (1994).7 The dominating root is real in TS. It may also be justified in the DS models that are close

toTS.

(314)

(3.14)

(3.15)

Since is equivalent to

Page 51: Hatanaka econometry

38 Unit-Root Tests in Univariate Analysis

(— 1, 1] 8 of which the right boundary is important. In Hatanaka and Koto (1994)(3.15) is approximated by

with a sufficiently large r. Our principle for selection of r is that risk ofunderfilling is unbearable but overfitting may be accepted. The equation, A.r —a\Xr~l — ... — ar = 0, has all roots less than unity in moduli. For a givenS in (-1, 1) (ai, ...,ar) is estimated by OLS on (1 - 8L)-l&x, = Ax, +8Axt-i + . . . + 8'~2Ax2. For 8 = I (a\, ..., ar) is estimated by OLS on x, ifthe resulting equation, A/ - a\X""~l - . . . - ar = 0, has its dominating root lesthan 1 — e(e > 0). 9 If the dominating root exceeds 1 — e, AR(r — 1) is fittedto x, — (1 — e)x,-]. (This idea has been taken from Fukushige, Hatanaka, andKoto (1994).) The sum of squared residuals are calculated for each 8 in (—1, 1].The S at which the sum is minimized is our estimate of 8, and the associated(a\,..., ar) is our estimate of AR coefficients. In regard to the asymptotic theorye is O(T~l), but in practice e is determined by simulation experiments for agiven T.

The above procedure can be justified as follows.(a) If {x,} is a stationary AR(ro) with r^ < r and if the dominating root

of the characteristic polynomial does not exceed 1 — e, then plim<5 = 1 andpl im(«i , . . . , ar) — (a\,..., aro, 0, . . . , 0). Overfitting the lag order has no illeffects.10 If r0 < r then S may be made redundant over (—1, 1 — e) by letting(1 — 8L) be a factor of (1 — a\L — . . . — arU\ but the minimum of squaredresidual sums over S does not occur in the interval (—1, 1 — e). (b) If {*,} is anon-stationary AR(ro+1) with ro < r and a single unit root, then the probabilitythat 8 converges to unity is asymptotically zero. But 8 does not have a probabilitylimit if r<) < r, because \ — a\L — ... — arU can have a factor 1 — 8L, with 8 in(—1, 1 — e), leading to unidentifiability of 8. As a result (a\, ..., ar) is entirelyunrelated to (a\,..., ar0). However, this tends to make the test more powerfulthan in the case where the right lag order is chosen.

With the pure AR(r) model the Yule-Walker equations can be used to formE(y), which may now be written £(a) with a = (a\, ..., ar)'.nThe inversion

8 In denoting intervals between a and b both a and b are included in [a, b]; b is, but a is not[a, b]; and neither a nor b is included in (a, b).

9 It is after my work was completed that Choi (1993) became available. Choi (1993) proposesa method to set a confidence interval on (1 — a\ — ... — ar).

10 Overfitting has been excluded from the consideration in the existing MA unit-root literature,but its consideration is necessitated by the practice of selection of lag orders stated in the text hereand in Section 6.2.3, i.e. to avoid underfilling as much as possible while accepting risk of overfilling.

11 For an illustration consider ao*r+ai*(-i +«2*/-2 = £/ with ao = 1, E(x,xt-j) = YJ, E(e^) =erf. Let y' = (yo,yi,..., Xr-i), e\ = (1, 0, . . . , 0), and

(Contd)

(3.15)

Page 52: Hatanaka econometry

Test for Trend Stationarity 39

of E(fl), which is T x T, involves only an inversion of an r x r matrix, because

where S(a) is a banded lower triangular matrix, / — a\L* — ... — arL*r,and one can easily determine how C depends on a.12 (L* was defined earlierabove (3.12).)

Then (Ai +A^)y = a%e\, from which yean be obtained. Then S(o) is Xo-fr + Xi(£* +L* ) + . . .+rr-t(L*T~l+LtT-1).

12 In the above illustration

Page 53: Hatanaka econometry

Unit-Root AsymptoticTheories (I)

In the previous chapter we have learned how to test for TS against DS. In thesequel of developments in the unit-root field, however, it was preceded by thetesting for DS against TS. This in turn was made possible by the developmentof new asymptotic statistical theories on the unit root in Fuller (1976), Dickeyand Fuller (1979), Phillips (1987), and Phillips and Perron (1988) among others.These theories are explained in two steps. The first step is given in the presentchapter. It deals with the elementary but fundamental case where AJC, is i.i.d. Thesecond step is given in Chapter 6. It explains more advanced aspects includingthe case where Ax, is an ARMA.

Let us recall basic elements of the standard asymptotic theory, for example,Judge et al. (1985, ch. 5) or Spanos (1986, chs. 9, 10). If {e,} is i.i.d. withE(s,) = 0 and E(s2) = a2, then 7~1//2 Y^\ £t converges in distribution as T —>oo to N(0, al), which is a simplest form of the central limit theorem. Moreover,T~^^2^e2 converges in probability to or2, which is a version of the law of

P

large numbers. The convergence in probability will be denoted by —>•, and the

convergence in distribution by -KThe terms such as T~]/2 and T~l may be looked upon as normalizers to get a

well-defined probability distribution or a constant in the limit. Normalizers forthe first and the second-order sample moments are respectively 7"""1/2 and T~l,and the limiting distribution is a normal distribution. Statements in the previousparagraph hold true on a wide class of models more complicated than the i.i.d.

More generally, a stochastic process {XT } with T as index is said to be in the orderof T~a, abbreviated as Op(T~a) when the following condition holds. For any s suchthat 1 > e > OthereexistsAesuchthatP[|:rajEr| < Ae] > 1 - e for all sufficientlylarge T, i.e. T"XT remains bounded in probability while T -> oo. A lemma mostfrequently used to determine Op deals with the mean square as follows.

LEMMA. If there exists c(> 0) such that E(T^Xj) < c for all T, then XT is0P(T-*).1

For the

4.1 Pure Random Walk without a DriftThe situation changes radically when a random walk is involved. Suppose that{x,} is generated from an i.i.d. process {e,} by

1 A proof is found e.g. in Fuller (1976: 185).

4

with zero mean, and

Page 54: Hatanaka econometry

Unit-Root Asymptotic Theories (I) 41

Assuming that E(XQ) = 0 and et, t > 1, is independent of XQ, it is seen thatE(x,) = 0 and var(x,) — tcr2 + E(XQ). The appropriate normalizer of Y^i xt *snot T~l/2. In fact, in considering its mean square,

it diverges to oo as fast as T3, because I2 + 22 + . . . + T2 — T(T + l)(2T +l)/6. The appropriate normalizer is T~3/2 as E(T~3/2 ^x,)2 remains boundedas T -> oo. As for J^[ x2 it can be shown that it is OP(T2), and the appropriatenormalizer is T'2.

The mathematics of the stochastic process has long had a continuous-timestochastic process called the Wiener process or the Brownian motion process.The importance of this process in unit-root asymptotic theories was recognizedin White (1958), and was emphasized in Phillips (1987). The Wiener processis explained in Appendix 2. Let w(r), 1 > r > 0, be the scalar standard Wienerprocess. It is a continuous-time version of the random walk with a\ — 1, andfor a given value of rw(r) is distributed in N(0, r). Then

as explained in Appendix 2, the equation (A2.2). Here ->X means convergence

in distribution to the distribution of the random variable, X. Since r~3/2^0^-0, x0has no effect upon the limiting distribution. The right-hand side of (4.2) isGaussian, in fact Af(0, crr/3), because it is linear in w(-). See Banerjee et al.(1993: 27) for more about this point.

As for T~2 X)f=i x2 it converges in distribution to a random variable composedby the following functional of the standard Wiener process,

See Phillips (1987: 296) for a proof. The right-hand side of (4.3) is not distributedin a normal distribution. Notice that in (4.2) and (4.3) £!/=i on the left-handside corresponds to /0 on the right-hand side, and x](i = 1, 2) on the left-handside corresponds to wl on the right-hand side. This kind of correspondence iscarried through more complicated expressions.

Consider a model

(4.1)

(4.2)

(4.3)

(4.4)

Page 55: Hatanaka econometry

42 Unit-Root Tests in Univariate Analysis

where {et} is i.i.d. with zero mean and variance a\. The OLS of p is

(note that e,es = ese,) it follows that

3 Nowadays the tables are produced by simulations on the functional of the Wiener processas explained in Section 7.3.1, but Dickey relied upon a simulation on a different expression that isapplicable to both the finite sample and limiting distributions.

4 Figure 4.2 has been obtained from a histogram of simulation results. See the part, c = 0, inNabeya and Tanaka (1990: fig. 1) for a more accurate figure of the p.d.f.

(4.5)

(4.6)

(4.7)

(4.8)

Let p° be the true value of p and consider the case p° — 1. Then the datagenerating process of {x,} is (4.1), and we have

The limiting distribution of the denominator of (4.6) has been given in (4.3).(Note that T~2Xj converges in probability to zero so that it can be ignored.)As for the nominator

Therefore combining (4.3) and (4.7) we obtain

This will be a basis of many developments in the unit-root field.2 Needless to saythat w(-) in the denominator and in the nominator are the same one. The limitingdistribution is tabulated by Dickey in Fuller (1976: 371, table 8.5.1, the part forp, the line for n — oo).3 Sometimes it is called the Dickey-Fuller distribution. Itslocation is shifted to the left of origin, the distribution is skewed to the left, andP(p - 1 < 0) is larger than 0.5. See Fuller (1976: 370). Tables in Fuller (1976also show finite sample distributions of T(p— 1) for T = 25, 50, 100, 250, an500. That for T = 100 is quite close to the limiting distribution. See also part (a)of Figure 4.1 and Figure 4.2, which have been compiled by Yasuji Koto.4

2 The representation of limit distribution of p by the Wiener process originates in White (1958),but the particular expressions in (4.7) and (4.8) arc as recent as Chan and Wei (1988). Expressionsalternative to (4.7) and (4.8) are also found in the literature. From the identity

i.e.

Page 56: Hatanaka econometry

Unit-Root Asymptotic Theories (I) 43

In standard models (i) \/T times the error of estimation converges in distribu-tion to a random variable, and (ii) the random variable is distributed in normaldistributions with zero mean. In the random-walk model the expression (4.8)reveals (i) that T (instead of \/7") times the error of estimation converges indistribution to a random variable, and (ii) that the random variable has a non-normal distribution. Comparing the above statement (i) for standard models andthe statement (i) for the random-walk model, estimators in standard modelsare \/T-consistent, but the estimator in the random-walk model is nearly T-consistent,5 sometimes called 'super-consistent'. Comparing the statements (ii),estimators in standard models have no biases in O(T~}/2), and the estimator inthe random-walk model has a negative bias in O(T~}). Even in standard modelsthere are usually biases in O(T~l).

Note that (4.8) is free from a\, and there is no need to consider the r-statistic.However I introduce it here for a later reference. The f-statistic to test the

5 It should not be said to be 7"-consistent because the expectation of the right-hand side of (4.8)is not zero.

FIG. 4.1 Cumulative distributions of functional of Wiener process

FIG. 4.2 A rough sketch of probability density function

Page 57: Hatanaka econometry

44 Unit-Root Tests in Univariate Analysis

hypothesis p = I is

(4.9)

We have

(4.10)

because of (4.3), (4.8), and plim s2 = a2. The distribution in (4.10) is tabulatedby Dickey in Fuller (1976: 373, table 8.5.2, the part for ?, the line for n — oo).The distribution is shifted to the left of the origin.

4.2 Pure Random Walk possibly with a Drift

Consider an extension of (4.4),

X, = //, + pXt-i + £,. (4.11)

Let fj,° and p° be the true values of /j, and p respectively, and assume thatp° = I . The data-generating process of {xt} is

(4-12)

If /j,° ^ 0, {x,} has a linear deterministic trend, and /u,° is called a drift.TheOLS of pin (4.11) is

(4.13)

We must examine the cases, /u,° = 0 and ̂ 0 separately.

4.2.1 True fj, is zero

Suppose that /u° = 0 and p° — 1. Then the data-generating process is (4.1), andappropriate normalizers are identical to the previous ones. However, expressionsinvolved in (4.13) are different from the previous ones, and here we have

(4.14)

(4.15)

where

Page 58: Hatanaka econometry

This is due to Phillips and Perron (1988). The distribution in (4.16) has beentabulated by Dickey in Fuller (1976: 371, table 8.5.1, the part for pM, n — oo).It is seen that the leftward location shift is even larger than in (4.8), where thesample means are not subtracted. Compare Figure 4.1 (a) and (b).

4.2.2 True n> is not zero

Suppose that /n° ^ 0 and p° = 1. The data-generating process is now (4.12). Ininvestigating the denominator of (4.13), the term, (x,-\ - x~i), consists of thedeterministic part, dt-\ = ^(t — 1 — T/2), and the stochastic part, v,-\ — v_i,where v, = J3L=i £*- F°r tne deterministic part we see ^2 <^-i = O(T3). Forthe stochastic part the v, here is x, in (4.1) (with XQ — 0), and in view of (4.14)ZXv'-i - v-i)2 is OP(T2). Thus within £X*r-i ~^-i)2 tne deterministic partdominates the stochastic part, and £X*»-i ~^-i)2 should be normalized by T~3.Moreover

Unit-Root Asymptotic Theories (I) 45

The former is obtained from

Since

(4-16)

(4.16')

(4.17)

Similarly it can be seen that J] £t(x,-\ — x_\) should be normalized by T 3//2,and

where op(l) means a term that converges to zero in probability. It can be shownthat

(4.18)

Therefore, combining (4.17) and (4.18)

Page 59: Hatanaka econometry

46 Unit-Root Tests in Univariate Analysis

This has been pointed out in Dickey and Fuller (1979) and emphasized in West(1988). See Appendix 3, Sections A3.1 and A3.2 for a more accurate derivationof (4.17), (4.18), and (4.19). Keep in mind that p is T3/2 consistent in the presentcontext.

The conclusion derived from Sections 4.2.1-2 is that the asymptotic distribu-tion of p differs radically, depending on whether /z° = or ^ 0. The distributionis Gaussian when /u,0 ^ 0, but given by a functional of Wiener process whenH° = 0.

Hamilton (1994: 486-97) is a nice reference that explains the same materialas Chapter 4 in a different representation. The contents of the present Chapter 4are summarized at the end of Chapter 7.

Page 60: Hatanaka econometry

5

Regression Approach to the Testfor Difference Stationarity (I)

Let us consider how to test difference Stationarity as the null hypothesis againsttrend Stationarity, assuming that {xt} may possibly contain a linear deterministictrend.

5.1 A Method that Does not Work

Suppose that we run the regression

and construct the ^-statistic to test p — 1,

where s2 is the sum of squared residuals divided by (T — 1). It turns out thatthis t is useless to test DS against TS.

(i) Let us analyse ? under the null hypothesis. When {x,} is DS and generatedby (4.11) with p° = 1, the distribution of t depends upon whether /x° — 0 or^ 0. Since fj,° is unknown we do not know which distribution to use to set thecritical values. In the terminology of the testing theory the test based on t is notsimilar. Just to continue on to the next point let us suppose that we choose anarbitrary negative value, c, so that the rejection region is t < c, because the testwould be left one-sided as p > 1 is a priori ruled out.

(ii) Let us then analyse t under the alternative hypothesis. [xt] is TS andgenerated by a linear trend plus a stationary process. Let p be the OLS estimateof p in (5.1). Then in so far as the coefficient of time variable is not zeroin the linear trend, both x, and x,_\ in (5.1) contain the time variable, whichleads to plim p = 1. Perron (1988) shows that (p - 1) is OP(T~2) and that

(£(*r-i -*-i)2)V2 is Op(T3/2). Thus t is OP(T~1/2). Note that ? 4> 0, whilezero is not in the rejection region chosen above. As T —> oo the test never rejectsthe DS when the true data-generating process is TS. The test is inconsistent.

Both troubles mentioned in (i) and (ii) above would vanish if the presence ofa linear trend is a priori ruled out. See the later Section 5.3.

(5.1)residual,

Page 61: Hatanaka econometry

48 Unit-Root Tests in Univariate Analysis

5.2 Dickey-Fuller Test

We turn to a regression approach that does work, i.e. the widely appliedDickey-Fuller test. In the present chapter it is assumed for simplicity that adeterministic part of {xt} is 0Q+6\t and the stochastic part is AR(1). The model is

where {e,} is i.i.d. with E(et) = 0 and E(e^) = a^.Suppose that \p\ < I so that (5.2) represents a TS model. Define /u. and ft by

When written concretely, ^ = OQ(\ — p) + p0\ and ft = #i(l — p). Then (5.2) is

The transformation, (00, #1, p) «» (/"-, p1, p) is one-to-one. The term, fit, distin-guishes (5.4) from (5.1), which is the regression equation in the method that hasfailed.

An important point here is that we do not consider (5.4) by itself sepa-rately from (5.2). The ft and p in (5.4) are not variation-free because of therelation (5.3). If they were variation-free, (5.4) would produce a quadratic deter-ministic trend in {x,} by setting p = 1.

To consider the DS, set p = 1. Then (5.3) induces /u, = 0\ and ft = 0 in (5.4),and we get

whether (5.2) or (5.4) is used. (5.5) is nothing but (4.11) with p = 1, a randomwalk possibly with a drift. (5.5) is also what one would obtain if the constraint(ft, p) = (0, 1) is introduced in (5.4). The seminal Nelson and Plosser (1982)use the OLS on (5.4) (and its extension (7.2) below) to test for DS against TS.Indeed we can demonstrate that the /-"-statistic to test for (ft, p) = (0, 1) in (5.4)provides a similar and consistent test for DS against TS.

(i) Let us analyse the /-"-statistic under the null hypothesis. {xt} is DS andgenerated by (5.5). The unconstrained OLS, (ft, p), in (5.4) converges in prob-ability to (0,1). The /-"-statistic compares the sum of squares of the constrainedOLS residuals with the sum of squares of the unconstrained OLS residuals,where the constraint is (ft, p) = (0, 1). Both of these residuals are free from /zin (5.4), and, if {xt} is generated with p° = 1, it is only in /z that the true value of9\, 0p is contained. It is thus seen that 9° is not involved in either the constrainedor the unconstrained OLS residuals. (This is seen in Appendix 3.3.1(ii) moreconcretely.) Therefore unlike (4.13') in the method that has failed, the present/•"-statistic is free from 9°, i.e. invariant to whether 0° = or ^ 0. The F-test issimilar, (ii) Let us then analyse the /-"-statistic under the alternative hypothesis,\p° < 1. The unconstrained OLS estimate of (ft, p) in (5.4) converges in prob-ability to (9°(l - p°), p°) and this is separated away from (0,1) specified by theconstraint. The /-"-statistic diverges as T —> oo, and the test is consistent.

(5.52)

(5.3)

(5.4)

(5.5)

Page 62: Hatanaka econometry

Test for Difference Stationarity (I) 49

One can recognize another difference from the method that failed. No matterwhether 0® = or ^ 0, the limiting distribution of T(p — 1) in (5.4) is notnormal but a functional of the Wiener process if {x,} is DS. Recall that it isthe time variable contained in x,~\ that has made p distributed in a normaldistribution in Section 4.2.2. Here in (5.4) the part of xt-\ which is linearlyrelated to another regressor, t, is eliminated from xt-\ by the well-known logicof the least squares.1

Explanations in the present and previous paragraphs have been somewhatintuitive. They will be supplemented by mathematical explanation in Section 6.1.

Dickey and Fuller (1981, table VI) present the finite sample and asymptoticdistributions of the F-statistic. The mathematical expression for the limitingdistribution will be given in (6.8c) below.

The /-statistic to test for p = 1 in (5.4) also provides a consistent test for theDS against the TS. It is free from 0j, and it diverges to minus infinity if {x,}is generated by a TS model. The finite sample and asymptotic distributions ofthe /-statistic are tabulated by Dickey in Fuller (1976: 373, table 8.5.2, the partfor rr). The test should be one-sided, setting the critical region on the valuessmaller than p — 1.

The /-test has been used most widely in the past empirical unit-root studiesever since Nelson and Plosser (1982), but my opinion is that the F-statisticshould be recommended on the ground that it tests a larger number of constraintsimplied by p = 1. Readers might wonder if the F- and /-tests are asymptoticallyequivalent in view of the multicollinearity between two regressors of (5.4), t andxt-\. The conjecture is incorrect. The multicollinearity arises only when 0® / 0.On the other hand Appendix 3.3 shows that the F- and /-tests deal with theterms invariant to 6® — or / 0, as seen from the nominator of the F-statisticshown in equation (A3.12). The limiting distributions of the F- and /-statisticsare given in (6.8c) and (6.8&) below, and they are not identical.2

A number of new testing methods are appearing at the time of writing,for example, Choi (1992), Elliot, Rothenberg, and Stock (1992), Schmidt andPhillips (1992), and Ahn (1993). Schmidt and Phillips (1992) and Ahn (1993)develop a Lagrange multiplier test in the parametrization in which all parametersare variation-free (in contrast to (ft, p) in (5.41)). Levin and Lin (1992) and Quah(1993) consider panel data, and Toda and McKenzie (1994) investigate missingobservations. All these will not be surveyed in the present book.

1 Readers may be perplexed by absence of 60 in the above explanation. When p° = 1, So is notasymptotically identified. When \p°\ < 1 it is identified.

2 The f-test is one-sided because p > 1 is a priori excluded from our consideration. On the otherhand the F-test is bound to be two-sided in this sense, and it might make the F-test inefficient. 1 donot imagine that the loss of efficiency is large. Comparing initially the t and t2 statistics, it is seenthat most of the probability for (p- I)2 > a2 comes from that for p < I-a because the distributionof t is left-shifted and left-skewed. Comparing next the t2 and the F-statistic for (p, p) = (0, 1), thelimit distribution of the latter consists of two terms as seen in (6.8c), and the second term is that ofthe t1 while the first term is concerned with ft — 0 and irrelevant to the present comparison. Nonethe less it is worth making a simulation study.

Page 63: Hatanaka econometry

50 Unit-Root Tests in Univariate Analysis

5.3 The Case where the Deterministic Trend is Confinedto a Constant

So far we have considered how to test for DS against TS in the possible presenceof a deterministic linear trend. The test is valid no matter whether the lineartrend does or does not exist. Occasionally we wish to perform the test underthe constraint that the deterministic trend is just a constant under both the DSand TS hypotheses, i.e. 6\ = 0 in (5.2). For example, we might be fairly sureof the constraint as a result of the model selection analysis in regard to theform of deterministic trends. Then the equation (5.1) can be reintroduced with/z = 6*0(1 - p). When {xt} is generated with p° = 1, the constraint forces /T,° tozero. Run OLS on

The /-"-statistic to test for (//, p) = (0, 1) in (5.6) can be used to test for DSagainst TS. The finite sample and asymptotic distributions of the statistic when{jcr} is generated with p° = 1 are found in Dickey and Fuller (1981, table IV).The inconsistency mentioned earlier in Section 5.1 does not arise in the presentTS hypothesis, i.e. {x,} is a stationary AR with a constant mean.

The f-statistic to test for p = 1 can also be used. The distributions under theDS hypothesis are found in Fuller (1976: 373, table 8.5.2, the part for r;i), butthe /-"-statistic is more desirable.

(5.6)residual,

Page 64: Hatanaka econometry

6

Unit-Root AsymptoticTheories (II)

The present Chapter assembles three unrelated topics of asymptotic theories.Looking back to Chapter 5, Section 6.1 presents the mathematical analysis ofthe tests on the case where { AJC,} is i.i.d possibly with a non-zero mean. Lookingforward to Chapter 7, Section 6.2 explains the mathematics used for the casewhere {A*,} is serially correlated. Section 6.3 gives the asymptotic theory ofthe MA unit-root test explained in Section 3.4.2.

6.1 Deterministic Trends

which is seen by writing T~5/2tv, = (t/T) • T 3/2v,. (6.1) is due to Phillips andPerron (1988). The right-hand side of (6.1) is actually Gaussian, N(0, (2/15)<r2).

The unconstrained OLS estimator, (£i, /J, p), in (5.3) is analysed inAppendix 3.3. OLS residuals unconstrained and constrained by (/3, p) = (0, 1)are also shown there. On the basis of the formulae derived there I shall hereshow the limit distribution of T(p — 1), the f-statistic, and the F-statistic on(5.3). It is convenient, though not necessary,1 to orthogonalize deterministicregressors as we can then proceed along regressions with orthogonal regressors.Let 7 = t - (T + 2)/2 so that t sums to zero over t — 2, ..., T. Then

1 All results of the following derivation can be obtained without the orthogonalization. Regressxt~\ upon (1, t), and let jc/_i be the residual. Then p is obtained by regressing x, upon xt-\. Anormalized xt-\ corresponds to a Wiener process

where r(r) = (1, r). It is seen that the above Wiener process is identical to wi(r) denned in (6.5),and that the right-hand side of (6.8a) corresponds to the coefficient in regressing et upon xt-\.However, it is likely to use (6.5) rather than (*) in tabulating the distributions by simulations asexplained in Section 7.3.3.

(6.1)

I shall present a mathematical explanation of Section 5.2. It should be read inconjunction with Appendix 3.3.

We begin with a basic asymptotic relation. If v, — Yl\ £* and {st} is i.i.d withE(e,) = 0 and E(er

2) = a}, then

Page 65: Hatanaka econometry

2 For two functions, f ( r ) and g(r), defined over 1 > r > 0 the inner product of / and

the squared length of g is

52 Unit-Root Tests in Univariate Analysis

limr^oo T~3 Y^~t2 = 1/12, and, using (4.2) and (6.1), we see that

(6.2)

which is actually Gaussian. Upper and the lower limits of all integrals beloware 1 and 0 respectively. We are concerned with (A3.10a) and (A3.10&). Using(4.2), (4.3), and (6.2) we see

(6.3)

The left-hand side of (6.3) is identical to

(6.4)

Using wi(r) in (4.16') and writing

(6.5)

(6.6)

we see that

Readers are advised to confirm that the right-hand sides of (6.3) and (6.6) areindeed identical, and also that

is the projection of w(r) onto the linear subspace spanned by (l, r — |).2

If the true data-generating process is (5.2) with p° — 1, it is shown inAppendix 3.3 that T(p—l)f»A~1&.i, where A and AI are defined in (A3.10a)and (A3.10&). From the above explanations we see

Moreover

the expression in

so that the projection of f on g isg is

Page 66: Hatanaka econometry

Thus

Unit-Root Asymptotic Theories (II) 53

Using this it can be shown that

(6.7)

(6.8a)

(6.86)the f-statistic to test

(6.8c)

The distribution (6.8<z) has an even stronger leftward shift of location than (4.16).See Fuller (1976: 371, table 8.5.1, the part for pr, n = oo) and Figure 4.1(c)above. The essential part of the F-test statistic is derived in (A3.12), and we get

the F-statistic to test

(6.8a) through (6.8c) are due to Ouliaris, Park, and Phillips (1989).There is an alternative expression for f ( r — 1} dw(r). Since

which is due to Schmidt and Phillips (1992). We thus have

(6-9)

In Section 5.3 we considered the case where the deterministic trend is confinedto a constant. On the OLS along (5.6) it can be shown that

the F-statistic to test (6.10)

where w\(r) was defined in (4.16').

Page 67: Hatanaka econometry

where d1 = 'J^ooYji i-e- tne long-ran variance of {u,}, and 5 = Y^T Yj-Expressions (6.12), (6.13), (6.14), and (6.15) are generalizations of (4.2), (4.3),(4.7), and (6.1) respectively. Note the role that the long-run variance plays inplace of cr2 and also the presence of 8 in (6.14).

The role of long-run variance can be explained by the expression (2.8) becauseAx, = b(L)s, there may be equated to ut here. Then in regard to (6.12)

54 Unit-Root Tests in Univariate Analysis

6.2 Serial Correlations in Axt

6.2.1 Asymptotic theories

Let us consider the case where the process of Ax, is stationary with thecovariance sequence, . . . , y~\, 7o, Y\, • • • (Y-j = Yj*)- For simplicity I assumeinitially that E(u,) = 0. {xt} is generated by

Phillips (1987, 1988a) shows that

(6.12)

(6.13)

(6.14)

(6.15)

(6.16)

The last two terms on the right-hand side converge to zero in probability asT —>• oo, and the first term converges in distribution to the right-hand side of(6.12), where a2 = b(\)2a2e. As for (6.14), writing Ax, = b(\)e, + b*(L)(st -et-i) because of (2.7), and assuming XQ = 0, we see that

(6.11)

Page 68: Hatanaka econometry

which is due to Phillips and Perron (1988). The estimation of £ and a2 poses aproblem as explained in Section 7.3.3 below.

Applying the fully modified least squares (given in Appendix 6) to a univariateautoregressive process, Phillips (1993) obtains an estimator which is T3/2 ratherthan 7"-consistent. It is called the hyperconsistency. The new method differs fromOLS in adopting xt — Ajt/_i for the dependent variable, while xt-\ remains to bethe independent variable. The crucial point that leads to the T3/2 consistency isthat u, — A;t,_i = u, — M,_I has zero long-run variance. At the time of writing weare yet to see the impact of this discovery upon the whole inference problemsin the unit-root field. (I comment on the application to co-integrated VAR inAppendix 6.)

6.2.2 OLS and fully modified OLS

Consider the model, x, = pxt-\ + ut, in which no deterministic trends areinvolved. If p° — 1 and p is (4.5),

(6.18)

of which the first term converges in probability to — a2 Y^=o kjb* =o-f^bj&j+i + bj+2 + ...) = & because of (2.9) and (2.10), and it can beshown that the second term converges to zero in probability.

(6.17c)

The second term is

Unit-Root Asymptotic Theories (II) 55

The first term of the last expression is

(6.11 a)

(6 Alb)

Page 69: Hatanaka econometry

56 Unit-Root Tests in Univariate Analysis

6.2.3 Fuller transformation in the AR

When the model of {A*,} is parametrically specified one would look for anestimater of p better than the above OLS. Fuller (1976: 373-7) considers thecase where {Ax,} is a stationary AR (p - 1),

where {e,} is i.i.d. with E(E,) = 0 and E(s^) = a\. A model for {xt} that includes(6.19) as well as stationary AR(p) is

where (6.20') is assumed to be a stationary AR(p — 1). If |p| < 1, {x,} is astationary AR(p). If p = 1 {x,} is DS, and Ax, is AR(p — 1) as indicated in(6.19). In Fuller (1976: 373) new parameters (a\,..., ap) are defined through

The transformation, (p ,a i , . . . ,ap_i) •<-> (a 1,0(2, . . .,0;^) is one-to-one. In partic-ular it is seen by setting L = 1 that

!-«! = (!- p)(l - a, - . . . - ap_i). (6.22)

Let f(X) — A/^1 — a\Xp~2 —... — ap_i. The characteristic equation of (6.20')is /(A.) = 0. Since it is assumed to have stable roots only, and since /(+oo) =+00, /(I) = 1 - a\ — . . . - flp_i must be positive. From (6.22) it is seenthat «i = 1 o p = 1 and that «i < 1 & p < 1. Note also that whenp= 1,0!;+1 =fl;, Z = l , . . . , / 7 - 1.

The Fuller reparametrization through (6.21) represents the model, (6.20) and(6.20'), as

or, subtracting jtr_i from both sides

This expression has the advantage that when p = 1 the stationary and thenon-stationary variables are separated, i.e. xt-\ is non-stationary but Ajtr_i, . . . ,Ajtj-p+i are stationary. Correspondingly a\ is T-consistent whereas (012, • • • , ap)is \/T-consistent as shown in Appendix 4. Another advantage is that a non-linearleast-squares calculation is required to estimate (p, a\,..., ap-\) in (6.20) and(6.20') while an OLS is sufficient to estimate (a\,..., ap) in (6.23).

For a later reference I briefly point out another parametrization of (6.20) and(6.20'). They are combined into

(6.19)

(6.20)

(6.20)

(6.24

(6.23)

(6.21)

Page 70: Hatanaka econometry

Unit-Root Asymptotic Theories (II) 57

(6.23) and (6.24) are related through

Let us return to the parametrization (6.23'). Let the true value of ai be a®and assume o^ = 1. Run OLS along (6.23')- The distribution of the estimatorof a i - 1 depends on the long-run variance a2, which in turn depends upon theas and a2. In fact

However, the testing for DS does not require an estimate of a2. Construct thet-statistic to test a\ — 1 — 0, and let 1 be the statistic. In Appendix 4 readersfind a proof for the statement that if a° = 1

(6.27)

A remarkable point here is that unlike (6.18) this limiting distribution is free fromall nuisance parameters, a\,..., ap-\, and a2.. It has been tabulated as mentionedin Section 5.2. On the other hand if |a°| < 1, a\ converges in probabilityto a^(^ 1), and i diverges to —oo. The i provides a consistent test for thedifference stationarity against stationarity. This method is called the augmentedDickey-Fuller test.

In practice the order of AR, p — 1, is unknown. To select p one should followthe general-to-specific principle in Hendry (1979). A sufficiently high order,Pmax. is chosen, and initially one tests for pmax - 1 against pmm by one (or all)of the following tests: (i) the West to see the significance of the coefficient ofAjcr_p+i in (6.23),3 (ii) the portmanteau test that investigates serial correlationsof residual series {e,}, on which readers are referred to Box and Jenkins (1976,ch. 8), and (iii) the orthogonality test to investigate correlations between {e,}and {&Xt-p-T}, r > 1, which has been proposed in Hendry and Richard (1982).If Pmzx — 1 is not rejected, one tests for pmax — 2 against /?max — 1. The sequentialtests are terminated when pmax—j is rejected against pmax—j+l. The asymptoticdistributions of these test statistics are identical to those in the stationary ARmodels, because the limit distribution of («2, • • • , «p) is, no matter whethera® = or / 1 (see Appendix 4.2)). With each one of the three testing methodsdifferent levels of significance imply different selections of lag orders.

Our concern is not symmetric between the choice of p larger than its truevalue and that of p smaller. The former leads to some loss of efficiency, whilethe latter may seriously distort the test.

To the best of my knowledge a comprehensive simulation study has not beendone on the significance levels. I think we had better focus our concern not

3 See Anderson (1971: 42) for a theoretical analysis of levels of significance.

(6.25a)

(6.25a)

6.26)

Page 71: Hatanaka econometry

58 Unit-Root Tests in Univariate Analysis

on the frequency of a correct selection of lag orders but on the effect thatthe frequency of incorrect selection has upon the performance of the augmentedDickey-Fuller test. Ignoring ap that is nearly zero would have no effect upon theperformance of the test. On the other hand, ignoring a large value of ap wouldhave a devastating effect. I wonder if one can design a two-step simulationstudy as follows. In the first step one determines the c such that the test wouldbe seriously affected if one ignores ap such that \ap\ > c. This defines theunderfilling and overfilling of lag orders in terms of the effecl upon Ihe teslperformance. In the second slep we determine Ihe level of significance such lhalthe probability of underfilling is very small Ihough Ihe probability of overfillingmay be sizeable.

For Ihe case where p - 1 > 1 in (6.20') and p° = 1 Choi (1993) proveslhal Ihe direcl OLS of c = (c\,..., cp) in (6.24) is A/T-consislenl (ralher lhan7-consislenl), and lhal «jT(c - c) is asymptotically Gaussian with a rank defi-cient covariance malrix. (The deficiency is 1.) There are perhaps a number ofapplications of Ihis interesting discovery. Choi (1993) applies il to a confidenceinterval for (1 — c\ — . . . — cp) using c of an AR with an overfilled lag order.The interval would give some idea regarding how large Ihe dominating rool ofIhe characteristic polynomial could be.

In Hatanala and Kolo (1994) lag orders in Dickey-Fuller equations suchas (6.23') are chosen with Ihe 5% significance of f-values of Ihe highesl lagorder coefficient, ap in (6.23')- With lag orders so determined the dominatingrools of characteristic polynomials wilh coefficient estimated by OLS are foundcomplex-valued in a number of variables, for example, in Ihe historical realGNP and nominal GNP dala for Ihe USA. The problem requires a carefulexamination.4

6.2.4 MA case

Hall (1989, 1992) considers Ihe case where {A*,} is MA(q),

where {g,} is i.i.d. with £(ef) = 0 and E(s2,) = of. Consider the inslrumenlalvariable estimation of p in xt = pxt-\ + ut with p° = 1. In so far as k >q,x,-k is uncorrelated with Ax,. Since xt-k is correlated with xt-\,xt_k is avalid instrumenl for xt-\. The inslrumenlal variable estimator of p is

4 Chan and Wei (1988) investigate the OLS in the autoregressive process which contains acomplex unit root. Fukushige, Hatanaka, and Koto (1994) develop a method to test for stationarityagainst non-stationarity in the situation in which the dominating roots may possibly be complex-valued.

(6.28

Page 72: Hatanaka econometry

which is free from all nuisance parameters. The analogues of the t- and theF-tests are also presented in Hall (1989, 1992).

ARM A models will be dealt with in Chapter 7.Hamilton (1994: 497-512) is a good reference that gives the same material

as Sections 6.1 and 6.2 in a different representation.

6.3 MA Unit-Root Test

I am now ready to explain the asymptotic distribution of the MA unit-root teststatistic given earlier in Section 3.4.2.

6.3.1 Constant trend

Suppose that the model is given by (3.1 la) and (3.1 Ib) and that H0 is S = 1whereas Hi is S < 1. The test statistic is (3.12') which will be written SQ. UnderH0v = e~T~l(e'e)e, where s = (EI, ..., ST). The Mh element of (/ +D)v is

Unit-Root Asymptotic Theories (II) 59

Let us analyse

(6.29)

assuming that {Ax,} is generated by (6.28). First,

where a1 is the long-run variance of Ax,. Second, since Axr is uncorrelatedwith x,-k

Note that S in (6.14) and (6.18) does not appear here.5 Thus

5 That S in (6.14) does not appear in the present case can be seen as follows. Develop7""' J^ x,^ii Ajtf along (6.17a) and observe that the second term of the last expression of (6.17a) ishere

where the terms in two [ ] are uncorrelated in the present case where bj• — 0 if j > q. Whatcorresponds to (6.17c) is Op(T~l/2) here.

Page 73: Hatanaka econometry

60 Unit-Root Tests in Univariate Analysis

Thus

where r = (t/T). Regarding the nominator of (3.12')

Vo(r) = w(r)-w(l). (6.31)

(6.30)

Clearly T 'x the denominator of (3.12') converges to a^, and

(6.32)

This expression as well as its tabulation is found in Kwiatkowski et al. (1992).

6.3.2 Linear trend

To introduce a linear deterministic trend modify the model into

Ax, = 6\ + e, - Sst-i (6.33a)

x, =00 + £ j . (6.33fe)

H0 and HI are the same as before. The test statistic is identical to (3.12')except that what is subtracted is a linear trend instead of mean. Thus v —( v i , . . . , i>r), v; = xi-Oo-Qii, where (do, 9\) is the OLS estimate of (00, #1). Wedenote the new S by Si. Orthogonalize the regressor (1, t) into (1, t—(T+1 )/2)Under H0

The Mh element of (I +D)v is

Since

Page 74: Hatanaka econometry

The right-hand side of (6.33) is tabulated in Kwiatkowski et al. (1992).MacNeill (1978) gives a general formula for the partial sum of residuals in

the fitting of polynomial trend, and (6.31) and (6.34) are special cases of theformula therein.

6.3.3 ARMA case

Let us revert to a constant trend, 6>o, but generalize the i.i.d. e, to u, in (3.13c).The test statistic is (3.14') with y replaced by y. I shall give only a sequence ofhints to derive the asymptotic distribution. Initially assume that b\ = ... — bq —0 so that {«,} is AR(p). Let n = / - a\L* - ... - apL*p, where L* is definedin connection with (3.12'). a~2TlE(uu')Tl' % / so that E"1 ^ U'Tl. I replacethe GLS estimate of 00 by the OLS so that v(y) = (/ - e(e'eYle')u. Since nand e(e'e)~le' nearly commute, Hv(y) ~ (/ — e(e'e)~le')e = z. Therefore

Since n and D' nearly commute, II/Trr1 - D' = (DD' - D'n)n"' w 0.Likewise II 'DII' -Z) ^ 0, and UD'H^n^DU' *D'D. Thus (6.36) is

approximately

which is the same as (6.32). Replacing y by y has no effect upon the asymptoticdistribution.

When an MA is added as in (3.13c), represent ut by an AR with its orderextending possibly to infinity. Now II = / — a\L2 - ... a7-_1L*r~1. Since {ar}

Unit-Root Asymptotic Theories (II) 61

Using (6.9) we see that this converges to

The result is

(6.34)

The result is that

(6.35)

(6.36)

Page 75: Hatanaka econometry

62 Unit-Root Tests in Univariate Analysis

is bounded by a decaying exponential, the above reasoning goes through. I havefollowed Saikkonen and Luukkonen (1993a) in a part of the above reasoning.

63.4 Polynomial trend

The above method can be extended to the case of a polynomial trend in astraightforward manner. See Hatanaka and Koto (1994).

Salient points in Chapter 6 are summarized at the end of Chapter 7.

Page 76: Hatanaka econometry

7

Regression Approach to theTest for Difference Stationarity (II)

In Chapter 5 we have considered how to test for DS against TS in the casewhere Ax, is i.i.d. and [xt] may possibly have a linear deterministic trend. InSection 6.2 a mathematical analysis is given on the case where {Ax,} is seriallycorrelated with zero mean. Here we reintroduce a non-zero mean, and test forDS against TS with serially correlated Ax, and possible presence of a lineartrend in x,.

7.1 The Case where AXf is an AR

Let us consider the case where {Ax,} is AR(/?) possibly with a non-zero meanunder the DS hypothesis. An AR model that includes both the DS and TS is

where {E,} is i.i.d. with E(et) — 0 and E(ef) — of. The Fuller reparametrization(6.21) transforms (7.1) into

where n + fit — (1 - pL)(\ - a\L - ... - ap^\Lp^)(9o + 9\t) In particular,ft = 0](l - p)(l - a\ - ... - flp_i) so that ft = 0 if either p = 1 or (9, - 0.Moreover p — I <& a\ = 1, and the constraint on parameters of (7.2) inducedby p = 1 is 00, a,) = (0, 1). The 00 vanishes from (7.2) if p = 1.

When {jc,} is generated by p° — 1, the F-statistic to test for (/}, ai) — (0, 1) in(7.2) is asymptotically distributed as indicated on the right-hand side of (6.8c)free from all nuisance parameters, (6>o, Q\, a\, • • • , ap~\, of). This asymptoticdistribution is tabulated in Dickey and Fuller (1981, table VI, n = oc). The t-statistic to test for a\ — 1 can also be used. Its asymptotic distribution is theright-hand side of (6.8fc) and tabulated in Fuller (1976: 373, table 8.5.2, the partfor TT, n = oo), but I recommend the F-statistic because it tests a larger numberof constraints implied by p = 1.

The finite sample distributions in these tables are not applicable to the finitesample distributions in the present case of serially correlated Ax,. DeJong etal. (1992a) perform a detailed simulation study, and conclude that the empiricalsize is close to the nominal size when T is 100. (The empirical size is the size ofthe test obtained from the simulated, finite sample distribution, and the nominal

(7.1)

(7.2)

Page 77: Hatanaka econometry

64 Unit-Root Tests in Univariate Analysis

size is what would be expected if the finite sample distribution is perfectlyapproximated by the limiting distribution.)

If the deterministic trend is confined to a constant in both the DS and TShypotheses, we may run the regression

and apply the F-test for (/u,, a j ) = (0, 1) or the r-test for a\ — 1. The limitingdistributions under the DS are free from nuisance parameters (a\,..., ap-i, a^).Moreover, they are identical to those for the case where AJE, is i.i.d. The limitingdistribution of the ^-statistic is in Fuller (1976: 373, table 8.5.2, t^, n = oo),and that of the F-statistic in Dickey and Fuller (1981, table IV, n = oo).

Important points in the present Section 7.1 are reproduced at the end of thepresent chapter.

7.2 ARMA in General and the Schwert ARMA

7.2.7 ARMA in general

As for the case where Ax, is an MA or an ARMA there exist three lines ofthoughts.

(i) Suppose that {A*,} is MA(g) and k > q. Then jc,_^ is uncorrelated withAx,, and it leads to the instrumental variable estimator as suggested in (6.29).This idea due to Hall (1989) is extended to an ARMA(/>, q) model in Pantulaand Hall (1991) and Hall (1992a, b). Hall (I992b) contains determination of lagorders by the general-to-specific principle with the significance of the highestorder coefficient.

(ii) An invertible MA can be inverted into a stationary AR, though a nearlynon invertible MA requires a large lag order of AR to maintain an adequatedegree of approximation. Said and Dickey (1984) provides a mathematicalanalysis to justify the augmented Dickey-Fuller test performed on the ARapproximation of the MA that represents {u,}.

(iii) The ARMA model has been dealt with by some forms of the non-linearleast squares without inverting the MA part into an AR (see Box and Jenkins(1976: 273), Ansley (1979), and Granger and Newbold (1986: 92)). In thisframework Solo (1984) tests the DS by the Lagrange multiplier method.

The approach (i) will be explained in greater details. Admittedly q is notknown, but it can be estimated from residuals of the regression of xt upon(1, ?)• The k should be chosen to exceed the estimate of q with a sufficientmargin. To test the DS set up the equation,

Run the instrumental variable calculation with the instrument (1, t, x,-k) for theregressor (l,t,xt-i). If {A*]} is MA(q) possibly with a non-zero mean, Hall

(7.3)

residual,

Page 78: Hatanaka econometry

where p/ is the instrumental variable estimator of p and W2(r) has been definedin (6.5).

If {A*,} is ARMA(/>, q), run the instrumental variable calculationin (7.2) with instruments (\.,t,xt-k> &*t-k, • • • > &Xt-k-p+i) for regressors(1, t,x,-i, A x , _ i , . . . , Ajc,_p), where k > q. The asymptotic distribution of theestimator of «i as well as the ^-statistic here involves the long-run variance ofthe MA part of Ax,, which has to be estimated (see Pantula and Hall (1991)).

A computationally simpler method for the case of ARMA is to proceed alongSaid and Dickey (1984) and apply the augmented Dickey-Fuller test on (6.23')or (7.2) with a lag order sufficiently large to accommodate a good AR approxi-mation to an MA. This is the method that I have used to get the results presentedin Chapter 9. The most serious problem here is how to choose the AR lag order,p. The asymptotic theory in Said and Dickey (1984) suggests that p should goto oo as T —> oo and moreover that T~]/3p —>• 0. But this hardly provides auseful guide in practical cases where T is small. I have adopted the general tospecific principle explained in Section 6.2.3 for the pure AR. The pmax is 12 forthe annual data with T ranging between 80 and 120, and 24 for the seasonallyadjusted quarterly data with T ranging from 130 to 170. I now wonder if thePm-dx should have been larger in the quarterly data. The f-value for the highestorder coefficient is examined with 5% significance level, which is higher thanwhat seems to be commonly used. Some simulation studies are indicated inSection 8.3, but 1 have had no time for an adequate simulation experiment.

7.2.2 SchwertARMA

Phillips and Perron (1988), Schwert (1989), and Campbell and Perron (1991)note a case of MA that requires a special caution. Consider the simplest case,ut = e, — b\Et-\, ft = /A = 0 in (7.3) so that

Suppose that p = 1 so that xt is DS and that b\ is close to but less than unity. Ifb\ is indeed equal to unity (7.4) contains redundant factors (1 — L) on both sides,and after cancelling them (7.4) is reduced to x, = £,; x, is stationary and AJC, haan MA unit root. More generally, if the MA representation of {u,} in (7.3) hasroot close to but less than unity, there is (i) near-redundancy and also (ii) near-stationarity as well as (iii) near-non-invertibility. The redundancy is a type ofnon-identification, and the near-redundancy is associated with the likelihoodfunctions which are like a tableland in particular directions.1 In (7.4) with p = 1

1 See Clark (1988) for more about the near-redundancy. The redundancy or, in general, the lackof identification cannot be tested on the basis of data.

Test for Difference Stationarity (II) 65

(1989) shows that

(7.4)

Page 79: Hatanaka econometry

66 Unit-Root Tests in Univariate Analysis

it is seen that b(l) defined in Section 2.2 is \ — b\. Thus the near-stationarityhere is characterized by b(\) being close to zero, i.e. an MA near unit rooThe spectral density of [ut] is lowest at zero frequency in its neighbourhood,because /(A) = a*\\\ - b{exp(i),)\\2 = of(l - 2bi cos A + b}), /(O) = cr2(l -bi)2, /'(O) — 0, and /"(O) > 0. I shall call the situation (where p = \ and hiis close to unity) the Schwert MA, because it is his simulation study that hasrevealed a devasting effect that it may entail on the inference.

Intensive simulation studies have been made by Schwert (1989), Belong etal.(1992a), and Agiakloglou and Newbold (1992) on the augmented Dickey-Fullerf-statistic to test a\ — 1 in (6.23') for the case where T = 100 and (6.20') is anAR approximation to the MA, u, — s, - b\Et-\- When b\ = 0.8 they find theempirical size appreciably exceeding the nominal size unless the lag order, p—l,in (6.20') is taken much larger than Akaike (1973) suggests. The situation seemshopeless if b\ = 0.9 while T — 100. How frequently do we observe the SchwertMA in economic time series? Perron (1988) claims that there are none in thefourteen time series which Nelson and Plosser (1982) analysed. DeJong et al.(1992a) also make an optimistic conjecture. On the other hand, Schwert (1987)finds the Schwert MA in his fitting of ARM A models to a number of economictime series. The discrepancy among their judgements might be explained inpart by difficulties in selecting AR and/or MA orders, about which readers arereferred to Christiano and Eichenbaum (1990).

It has been stated that the Schwert MA is a case of DS models which are closeto TS. Conversely, if {x,} is DS but close to TS, Axt has little spectral densityat the origin of the frequency domain. Attempting to generalize the SchwertMA let us consider ARMA representations of Ax, that could have such spectraldensity functions.

Set <r2 = 1. The spectral density function of AJC, is

It is seen that

Since (1 — p) and (1 — pS) are both positive, /A(0) > 0 if S > p. Thus ifl > 5 > p > — 1 , the spectral density is lowest at the origin over some domainthat surrounds it.

The ARMA models of which the spectral density is lowest at the origin oversome domain about it will be called the Schwert ARMA. They represent a class ofDS models that are close to TS. Examples are shown in Figures 7.1(a) and (b).In Figure 7.1(a) p = 0.928, S = 0.95, and the valley centred at zero frequency isnarrow and shallow. In Figure 1 .l(b) p = 0, S = 0.8, and the valley is wide and

(7.5)

Page 80: Hatanaka econometry

Test for Difference Stationarity (II) 67

deep. One might say that l.l(b) is closer to TS than 7.1(a). Some realizationsof Schwert ARMA look as though its integration order is varying over timebetween 7(1) and 7(0).

The near-boundary cases between DS and TS consist of (i) the DS that isclose to TS and (ii) the TS that is close to DS. There is no way of knowingwhich of the two is more frequent in economic time series, but we shall see inSection 9.2 how frequent the near-boundary cases are.

7.3 Miscellaneous Remarks

7J.7 (i) Tabulating limiting distributions

Many researchers including Perron (1989) and Johansen and Juselius (1990)have suggested that the distributions of functionals of the Wiener process canbe tabulated by straightforward simulations. For example, to tabulate the limitdistribution in (4.8), they would generate Tn pseudo-random numbers distributedin N(0, 1), £„-, t — 1, . . . , T, i = 1, . . . , n, for example, with T = 1, 000 andn = 10,000; calculate

for each /', and compile the results in an empirical probability distribution. Theintegrals in (4.8) are evaluated by summing relevant functionals over T infinites-imally narrow intervals in which [0,1] is subdivided. The distributions of farmore complex functionals of the Wiener process are derived this way. However,one should carefully investigate properties of expressions to be simulated in eachcase. Koto and Hatanaka (1994) find that expressions for the statistics concerning

FIG. 7.1 Spectral density functions ofSchwert ARM As

Page 81: Hatanaka econometry

68 Unit-Root Tests in Univariate Analysis

structural changes require a larger T (say, 5,000) to deal with the discontinuityinvolved. In any cases computation costs are little.

As for the bootstrapping method see Basawa et al. (1991a, 19916) to avoida pitfall.2

7.3.2 Power

Powers of the Dickey-Fuller one-sided f-test in (5.3) and of the one-sided r-testand the F-test in (7.2) are investigated in the simulation study of DeJong et al.(1992a, 19926). Assuming the lag order p - 1 = 1 in (6.20') the powers of thet- and F-tests are very low against p = 0.9 with T = 100. I cannot mentiona single number for the power as it depends upon a\ in (6.20') and the initialvalues as well as p and T, but I would say it is about 0.15 or 0.20. See alsoAgiakloglou and Newbold (1992) and Schmidt and Phillips (1992).

7.3.3 Estimation of the long-run variance

None of the test statistics that I gave above (except for the method in Pantula andHall (1991)) estimates the long-run variance, and their limiting distributions donot contain the long-run variance either. However, in some other methods thatI have not given above limiting distributions are made free from the long-runvariance by an explicit estimator of this parameter. Explanations and cautions arerequired regarding this estimation. The a2 in (6.12) to (6.16) is (2n) times (thespectral density of {Ajc,} at the zero frequency). The non-parametric estimate ofa2 (the estimate without fitting a model of a finite number of parameters such asARMA) is equivalent to the estimate of the spectral density at zero frequency.

(a) In general the estimate has a downward bias when {x,} and hence {Ax,}are demeaned or detrended and T is small, say less than 100. See Granger andNewbold (1986: 64) and Neave (1972).

(b) Earlier in (3.4) to (3.5') the spectral density function was looked throughthe Bartlett window. Here I reproduce that explanation in a more general context.In estimating the spectral density function some weights such as (3.4) have to beplaced on estimates of the autocovariance sequence. The weights are called lagwindow. The Fourier transform of such weights is called the spectral window(function),3 and the Parzen window and the Tukey-Hanning window are widelyused. See Fuller (1976: 296-9). The k in (3.4) also appears in weights otherthan that for the Bartlett window, and is called the truncation point in general.It has to be less than T — 1 by construction, and in fact substantially less than Taccording to the spectral estimation theory. As a result it is as though we estimatethe spectral density function looked through a window rather than the original

2 I owe this point to Mototsugu Fukushige.3 The window is also called the 'kernel'.

Page 82: Hatanaka econometry

Test for Difference Stationarity (II) 69

spectral density function. This brings about a bias in the spectral estimate unlessthe original spectral density /(A) is constant over the entire frequency domain[—n, 7t]. To achieve the consistency, the spectral theory suggests we increasethe truncation point k as T —> oo so that the spectral window (function), whichis (3.6) in the case of the Bartlett window, becomes zero at X ^ 0 and unityat A = 0 as T expands to oo. But a bias is inevitable in finite samples unless/(A.) = a constant.

In relation to the long-run variance we are concerned with the bias of spec-tral estimate at A. = 0. If the original density is highest (lowest) at A. = 0 inits neighbourhood as illustrated by a solid curve in Figure 7.2(a) and (b), thewindowed spectral density, the dotted curve, is lower (higher) than the orig-inal density at 1 = 0. Especially in Schwert ARMA one can hardly expect areasonable estimate of the spectral density at zero frequency in the case whereT is about 100. In fact the difficulty in Schwert ARMA is analogous to the onein Section 3.2. We should avoid the non-parametric estimation of the long-runvariance in analysing macroeconomic data in so far as we cannot rule out aSchwert ARMA.4

The parametric estimation of the long-run variance is simple, at least concep-tually. The expression (6.26) may be used in the case of an AR.

The augmented Dickey-Fuller test in its original version or in the versionof Said and Dickey (1984) does not use any estimate of the long-run vari-ance. The problem there is instead how to determine lag orders as described inSections 6.2.3 and 7.2.1.

(a) underestimation (b) overestimation

4 There have been attempts to modify the test statistics based upon the OLS using some estimatesof S and a1 in (6.12)-(6.16) so that, when applied to the serially correlated Ax, the limitingdistribution of the modified statistics are identical to those given in Chapters 4 and 5 for the casewhere Ax, is i.i.d. The estimates of S and a2 are non-parametric. The simulation studies, Schwert(1989) and DeJong et al. (1992a) among others, reveal that the tests are inappropriate in regard tothe size not only in the Schwert MA but in all other cases.

FIG. 7.2 Original and windowed spectral density functions: (a) underestimation,(b) over-estimation

Page 83: Hatanaka econometry

70 Unit-Root Tests in Univariate Analysis

As for the selection of lag orders there are some guiding rules of thumbavailable for the finite sample situation, but there is no rule for the selection oftruncation points in spectral estimates useful for the finite sample situation.

7.3.4 Seasonal adjustment

The historical data are annual, but the post-war data are quarterly. A problemwith the latter is which of the seasonally unadjusted and adjusted data shouldbe used. The most detailed investigation on this problem has been made inGhysels and Perron (1993). The limiting null distribution of the t- and F-teststatistics in Section 7.1 may be regarded virtually identical before and afterseasonal adjustments by X — 11 method. A surprising finding is that whenapplied to the TS model the seasonal adjustment brings about an upward biasin the OLS estimator of ai. This renders the tests in Section 7.1 less powerfulwhen applied to the seasonally adjusted data. However, the bias is not large ifthe lag order, p — 1, in (7.1) is as large as 4 in the case of quarterly data (seealsoDiebold (1993)).

7.3.5 Applications

The method given in Section 7.1 has been applied in a large number of empiricalstudies. The best known is Nelson and Plosser (1982), which has initiated anoutburst in the unit-root field. It investigates fourteen historical data in the USA:real GNP, nominal GNP, real per capita GNP, industrial production, employ-ment, unemployment rate, GNP deflator, CPI, wage, real wage, money stock,velocity, bond yield, and stock prices. The ?-test rather than the F-test is used,and the DS is not rejected on all but one of the fourteen series. The exception isunemployment rate. A large number of post-war data in a number of countries isinvestigated in Stultz and Wasserfallen (1985) and Wasserfallen (1986). Readersare also referred to Schwert (1987), and DeJong et al. (1992a) for studies of alarge number of US time series. As for some specific variables Hakkio (1986)analyses exchange rates, and Rose (1988) investigates real rates of interest inmany countries. Diebold (1988) investigates exchange rates in the frameworkof the autoregressive conditionally heteroskedastic (ARCH) model with a unitroot. Baillie and Bollerslev (1989) analyse daily data of the spot and the forwardforeign-exchange rates.

All the studies mentioned in the previous paragraph had appeared before webegan to pay close attention to deterministic trends.

Highlights of Chapters 4—7

The logical aspects of Chapters 4-7 are undoubtedly complicated. Therefore thefollowing reproduction of some of the more important points may help readers.

Page 84: Hatanaka econometry

(2) Demeaning and detrending have important effects upon the limiting distri-butions of relevant statistics. The following points provide a basis on which theeffects can be assessed. Let e = (e\,...,er)' be the residual in regressingv = ( v i , . . . , VT)' upon i = (1 , . . . , 1)'. Then

See (4.8) and (4.10). Consider a model

and OLS, (£, p). If //° ^ 0 and p° = 1, {x,} contains a linear trend. T3/2(p - 1)is asymptotically Gaussian, and the f-statistic to test p = 1 converges to N(0, 1)(see Section 4.2.2). (However (p — 1) in the present OLS cannot be used for

where See (6.3) to (6.6) and (A3.10a).

(3) In Chapters 4-7 we have obtained asymptotically normal distributionsin some cases, and non-standard distributions expressed as functionals of theWiener process in others. Let {e,} be i.i.d. with zero mean, and consider amodel

and OLS, p. If p° = 1,

f-statistic to test

where w\ (r) = w(r) — J w(r)dr.See (4.14). Let e s (e\, ..., CT)' be the residual in regressing v = (v i , . . . , VT)'upon (j, t), where t — (1, ..., T)'. Then

Test for Difference Stationarity (II) 71

(1) Let {e,} be i.i.d. with E(et) = 0 and E(e?) = of, and let v( =When w(r), 1 > r > 0, is the standard Wiener process,

(7.6)

(7.7)

(7.8)

(7.9)

(7.10)

Page 85: Hatanaka econometry

See (4.16) for the former, and (6.10) for the latter. These statistics can be usedfor testing DS against TS when the deterministic trend is a priori confined to aconstant. Finally consider a model

x, = p + pt + px^i+e,, (7.11)

and OLS, ( £ i , p , p ) . If ft0 ^ 0 and p° = 1, {xt} contains a quadratic trend,and the situation is analogous to the case where /x° ^ 0 and p° — 1 in (7.10).However, as stated in Section 5.2, the Dickey-Fuller test does not admit jf / 0and /o° = 1. If 06°, p°) = (0, 1)

See (6.8a) for the former, and (6.8c) for the latter. These statistics can be usedfor testing DS against TS in possible presence of a deterministic trend.

(4) Suppose that {ut} is a stationary process with zero mean and long-runvariance cr2 = Y^oo Yj wim Yj — E(u,u,+j). Let v, = $^=1 us. Then (7.6) and(7.7) hold after a^ is replaced by a2, but (7.8) becomes

See (6.14).(5) Consider the case where {«,} is a stationary AR (p— 1). Then all the results

on the test statistics in (3) above can be carried over with proper modificationsthrough the Fuller reparametrization. Consider

72 Unit-Root Tests in Univariate Analysis

testing DS against TS in the presence of a deterministic linear trend. The testis inconsistent under the TS model, xt = (j, + fit + pxt-i + st, \p\ < 1, asdemonstrated in Section 5.1) If (/z°, p°) = (0, 1), {xt} contains at most a constanttrend, and

F-statistic to test

F-statistic to test

Page 86: Hatanaka econometry

Test for Difference Stationarity (II) 73

and OLS,

/'-statistic to test

However, the limiting distributions of estimator T(a\ — 1) contains nuisanceparameters in every one of three models introduced above. See Appendix 4.1.

(6) Let us return to the i.i.d. {e,}, but unlike (2) above consider here regressinge = (E\, . . . , £T)' upon i. Let e = (e\, . . . , eT)' be the residual, and denoteS< = £!=i ei. Then

where v0(r) = w(r) - rw(l) (see (6.30)).Then let e = (e\, ..., ej) be the residual in regressing e upon (i, t). Then

where V] ( r ) = vQ(r) - 3(r2 - r)w(l) + 6(r2 — r) / w^X*.The results in (5) and (6) are extended to different modes of deterministic trendsin Hatanaka and Koto (1994).

^-statistic to test

which is free from nuisance parameters that comprise those for the AR ( / > — ! ) .(«2, • • • , oip) is asymptotically Gaussian (see Appendix (4.2)). Consider

Consider

If 03°, a?) = (0, 1) then

F-statistic to test

Page 87: Hatanaka econometry

8

Viewing the Discrimination as aModel Selection Problem

including Deterministic Trends

In Chapters 5 and 7 a number of tests are derived under the specification thatdeterministic trends are linear, OQ + 9\t. Even though the tests are invariantto 0i = or / 0 they would be invalid if deterministic trends are other thana linear function of time. The successful discrimination between TS and DSdepends on the valid selection of models for deterministic trends. In the presentchapter polynomial trends and linear trends with intercept and slope changesare introduced. The judgement about DS vs. TS has to be made in conjunctionwith selection of modes of deterministic trends. This suggests a large array ofmodels, and the encompassing theory in Mizon (1984) and Mizon and Richard(1986) will help us to organize our thinking regarding how to select the bestone from this array of models in terms of their congruence with observations.The encompassing theory will be extended into a comparison of /"-values forthe testing for DS against TS and for the testing for TS against DS. Simulationresults will be presented to investigate the separability of DS and TS modelsand the fairness of the comparison.

8.1 Various Modes of Deterministic Trends

A given stochastic process is decomposed into a deterministic part and astochastic part. The deterministic part is not influenced by the stochastic partso that the former is exogenous to the latter. The deterministic trend is a (non-stochastic) function of time representing a slow and smooth movement. In termsof a polynomial function of time any orders higher than, say, a cubic are unlikely.This smoothness is attributed to that in exogenous variables of the economy suchas resources, technology, etc. However, macroeconomic theories provide nothinguseful for the model specification of deterministic trends. Especially with respectto the productivity growth the interpretations are divided between a version ofthe real business cycle theory and its criticism. At any rate sudden discretechanges are allowed in structures of deterministic trends only when they arethought to be caused by exogenous events such as wars, institutional changes,and possibly some decisions of governments and international organizations suchas OPEC.

Page 88: Hatanaka econometry

Discrimination as a Model Selectic Problem 75

It is important to note that the deterministic trend in the TS model is thecentral line mentioned in Section 1.1.3. On the other hand the DS model has nocentral line. There the deterministic trend arises from E(Ax,) being non-zero,but there is no force operating to drive x, to the deterministic trend. Roles ofdeterministic trends are different between the DS and the TS models.

Especially on the historical data but on the post-war data as well it is obviousfrom the time-series charts (and in fact borne out by statistical tests) that thelinear trend is not adequate to represent deterministic trends in many economictime series. Perron (1989a, 1990) introduced structural changes in deterministitrends. His deterministic trend is

where

representing changes in the slope and intercept at t = TB- The analyses inChapter 7 can be easily adapted to this deterministic trend. (7.1) is replaced b

It follows from (6.21) that

Thus (8.2) is transformed into

where £, involves UTB+J(I}, j = 1, • • • , P and Irn+j(t), j = 2, ..., p. In partic-ular, fj,} — (I - a1)0l + 01163 and p} — (1 - ai)6>3. The DS, p = 1, forces(fio, pi, ai) — (0, 0, 1). ]TB+j(t), j = 2, . . . , p are not distinguishable fromIrn+i(t) asymptotically, and dropping UTH+J(I), j — 1, . . . , p has no effectsupon asymptotic distributions of OLS-based test statistics. We can ignore £, andproceed to the F-test for (/Jo, fit, «i) = (0, 0, 1). The limiting distributions aregiven in Hatanaka and Koto (1994).

The intercept shift, #i/rjs+i(0» in (8.1) requires a caution, which does notseem to be noticed in the literature on structural changes in the unit-root field.

where

(8-3)

(8.1)

(8.2)

(8.4)

otherwise.

Page 89: Hatanaka econometry

76 Unit-Root Tests in Univariate Analysis

The point is that 6\ is not asymptotically identified if p = 1. (It is if \p\ < 1.)This is seen from the Fisher information matrix of (8.2), assuming that {er} isGaussian. It is analogous to the asymptotic unidentfiability of 00 in the model(7.1) in the case where p = 1, which has been pointed out in Schotman and vanDijk (1991a, b).

The trend function (8.1) contains four subclasses, (i) 6{ = 93 = 0, i.e. a lineartrend, (ii) 0i ^ 0 and 03 = 0, a shift in the intercept, (iii) 9\ = 0 and 93 ^ 0,a slope change, and (iv) 9\ ^ 0 and $3 ^ 0. Two kinds of model selectionare involved. The first is to select one of the four subclasses, and the secondis to select the breakpoint, TB in the subclasses, (ii)-(iv). In Perron (1989a,1990) and Perron and Vogelsang (\992b) both kinds of the model selection aremade intuitively. Later Zivot and Andrews (1992), Banerjee, Lumsdaine, andStock (1992), and Perron and Vogelsang (1992a) all proposed formal methods toselect the most likely, single breakpoint, given one of the subclasses, (ii)-(iv).1

They show that the procedure to select a breakpoint must be explicit in orderto derive valid distributions of test statistics for DS against TS. Moreover, thisconsideration does make a difference in the results of tests on the historicaland post-war time-series data. While the DS is rejected in most of the UStime series in Perron (1989),2 the DS is not rejected in many of them oncethe selection of a breakpoint is taken into consideration in the inference theory.The diametrically opposed emphases in the results of Banerjee, Lumsdaine, andStock (1992) and Zivot and Andrews (1992) are, as I see them, associated withthe difference regarding which of the historical and the post-war time series areemphasized. Raj (1992) confirms the result of Zivot and Andrews (1992) on theinternational scale.

As for the other kind of model selection, i.e. selection of one of the subclasses,it does not seem to have been examined seriously in the literature. Admittedlythe presence of a particular mode of structural change is tested together withselection of a breakpoint, but alternative modes of deterministic trends are notcompared to select one from them. For example, a quadratic trend could be analternative to a linear trend with a slope change and, moreover, a breakpointneed not be selected in the quadratic trend.3

A number of specification analyses have borne out the importance of selectingappropriate models of deterministic trends. Rappoport and Reichlin (1989),Perron (1989, 1990), and Hendry and Neale (1991) all find that when the truegenerating process is trend stationary with a linear trend containing a change inits slope, the test for DS with a simple linear trend fails to reject the DS too often.In short, disregard of a complicated deterministic trend, which is real, undulyfavours the DS. On the other hand, I would think that some guard is requiredagainst a risk of an unreal, complicated deterministic trend to favour the TS.

1 Perron (1991fo) further generalizes the method to allow for possibly multiple breakpoints fora given polynomial trend.

2 The finding in Mills (1992) on the UK is contrary to Perron (1989).3 See Granger (1988a) for various forms of the trend function.

Page 90: Hatanaka econometry

Discrimination as a Model Selectic Problem 77

Those who have experiences with an extrapolation of economic time serieswould have recognized that parameters of trend functions vary a little by littleas the sample period is altered, indicating necessity to introduce continuousbut very slow time variations in parameters. I shall not discuss this problem,but it is advisable to keep in mind this somewhat shaky ground on which thediscrimination between DS and TS is based.

A visual inspection of time-series charts for macroeconomic variables suggestsplausibility of the following two kinds of non-stationarity in the variance. Oneis that big shocks enter into the system once in a while apart from small shocksthat are continuously operating. See Balke and Fomby (1991). The other is thatshocks in a part of the sample period have a larger variance than shocks in theother part. The former may well be a useful model for some macroeconomicvariables such as interest rates in the USA in the last twenty years, but I have noexperiences with the model. As for the latter kind of variance non-stationarity, itappears typically in historical data, in which (he pre-war variance is larger thanthe post-war. Koto and Hatanaka (1994) show that disregarding the variancenon-stationarity deteriorates only slightly the performance of our discriminationprocedure to be presented below.

8.2 Encompassing and 'General-to-Specific' Principles on theModel Selection

The above discussions lead us to viewing the whole issue of DS vs. TS asa problem of model selection. The deterministic trends mentioned above aredenoted as follows:

Mls: possible change(s) in the slope of a linear trend,Mlc: possible change(s) in the constant term of a linear trend, andMlcs: possible change(s) in the constant and slope of a linear trend.

Also let M°, Ml, M2, and M3 be respectively a constant, a linear trend, aquadratic trend, and a cubic trend. The DS is denoted by subscript, 0, attached tothese Ms and the TS by subscript, 1. Then the entire models under considerationare arrayed as in Figure 8.1.

The lines in Figure 8.1(a) indicate logical implication. Each of M\s, M\c, andM'", i = 0, 1, is not a single model but a class of models that have different

FIG. 8.1 (a) Array of models

Page 91: Hatanaka econometry

78 Unit-Root Tests in Univariate Analysis

breakpoints at one or more time points. The broken line at the top left cornerof Figure 8.1(a) is to remind readers that the intercept shift is not identifiedasymptotically in the DS model.

Each of DS models with different deterministic trends may consist of 1(2)and 7(1). Thus the portion of the tree that contains MQ, MQ, and M° is as shownin Figure 8.1(b).

Our task is to select the best model from this array.

5.2.7 'General-to-Specific' Principle

The theoretically precise inference for the model selection is feasible only forthe case where the whole array consists of a single trunk line, for which readersare referred to Anderson (1971: 34-43). However, an important guideline isavailable for a general case. It is the general-to-specific principle, which hasbeen proposed by Mizon (1977) and Hendry (1979), opposing to what had beendone widely before then in practical studies of model selection. The proposal isto start the investigation from the most general model, going down to a morespecific model only when it is justified by testing for the latter against the former.In the above array the DS models contain three trunk lines, MQCS —> A/QC —>•Ml -» Ml, M1 -> M1 -» M10 -> Ml, and M30 -> Ml -> M\ -+ Afg. So do0

the TS models.It is easy to test for a specific model against a general model in each of these

trunk lines. Suppose that we wish to test for M\ against M\cs with a singleknown breakpoint.4 The relevant model is (8.2), and we run the regression

which is (8.4) except that £r is dropped. If \p\ < 1 and Q\ = 6>3 = 0 so thatthe true values of fi\ and fj,\ are zero, the /-values of fi\ and //i converge to

4 On M\s, M}C,M\CS, i = 0,1 the selection of breakpoint is inseparable from the testing for M\against M}s (or M}C or M\cs), and admittedly it is desirable to select the breakpoint by a formalinference procedure as Zivot and Andrews (1992), Banerjee, Lumsdaine, and Stock (1992), andPerron and Vogelsang (1992a, b) did for testing M10 against M\" (or M\c or M\cs). I point out,however, that the precise inference is not feasible for the model selection from the whole array ofFigure 8.1(a) aside from the breakpoint, because six trunk lines are involved. In my opinion thereis not much gain in adhering to the precise inference on the breakpoint only.

FIG. 8.1(£>) A portion of the array

(8.5)

Page 92: Hatanaka econometry

Discrimination as a Model Selectic Problem 79

,/V(0, 1) asymptotically. Next consider testing for M0 against MQS within theDS hypothesis, assuming 7(1). Since (ft, Pi, «i) = (0, 0, 1) in (8.4) is now amaintained hypothesis, we run the regression

If the true value of //i = 63 is zero, the f-value of ^,\ converges to N(0, 1)asymptotically. Similar procedures can be designed for testing M\ againstA f / , z = 0, 1,7 = 1,2,3.

However, Mge requires a caution. Because of the asymptotic unidentifiabilitymentioned above there is no formal asymptotic testing for the intercept shiftin the DS model. My proposal is to look for an outlier in the series of Ax/ =Mo + M i U T B + I ( I ) + u,, where {u,} is stationary with zero mean and UTB+I(I) isdefined in (8.3). See Tsay (1988) for a survey of methods to identify outliers.

8.2.2 Encompassing principle

The traditional hypothesis testing is useful only for a comparison betweentwo models that are in a nested relation. In Figure 8.1 (a) any two models indifferent trunk lines are mutually non-nested. As for the comparison betweentwo mutually non-nested general models, say MO and MI, not necessarily thosein Figure 8.1(a), Mizon (1984) and Mizon and Richard (1986) proposed theencompassing principle. To investigate how well MO encompasses MI, to be

abbreviated as Mo^-M], we investigate a statistic of MI, say p\. Derive theprobability limit of ft\ under the assumption that M0 is the data-generatingprocess (DGP). The probability limit often depends upon unknown parametersof MQ, and, if it does, any consistent estimators of the parameters are substitutedfor them, where the consistency is judged under the assumption that MO is theDGP. Let an estimator of the probability limit be denoted by ftiy>). The encom-

passing test statistic to see how well MQ^MI is 0oi = Pi — Pi(0)- Supposefurther that with a statistic q, which serves as a normalizer of 0oi,

has a limiting distribution under MO, where T is the sample size and a is a real

number. SQ\ can be used for the analysis of MQ—»-Mi.The following example is a special case of an encompassing analysis in Mizon

(1984). Two alternative explanatory variables, x and z, are available for theexplanation of y. Two alternative models are

where x and z are each T x 1 and non-stochastic, x ^ cz with any scalar c,ft, and Y are eacn scalar parameters, and « and v are distributed respectively in

(8.6)

Page 93: Hatanaka econometry

80 Unit-Root Tests in Univariate Analysis

N(0, cr^/7-) and N(Q, a^I-r). We are concerned with M§-+M\. A statistic of MI is

and we shall evaluate /6j under MQ. All mathematical expectations below arecalculated assuming that y is in fact generated by MQ. Then

which involves a unknown parameter, y, of M0. The y is estimated by OLS inMO, and /3i(Q) is estimated by

The encompassing test statistic is

SinceMx~Mxz(z 'Mxzrlz 'Mx is idempotent with rank (T-2) , (8 .8) is f f^x2(T-2). Since (x - Mxz(z'MxZrlz'Mx)(Mxz(z'z)-1) = 0, (8.7) and (8.8) areMindependent. Writing

it is seen that

is distributed in the ^-distribution with (T - 2) degrees of freedom.For the standard DGP not involving either a stochastic or a deterministic

trend Mizon (1984) and Mizon and Richard (1986) demonstrate that the above

procedure for Mo^Mj would be identical to the well-known likelihood ratiotest for MO against M\ if it were applied to the case where MO is nested in M\

and if the statistic of M\ is the Wald test statistic. If MI—>Mo is investigated inthe same situation with the maximum-likelihood estimator of MO, "*/T(fj>\Q/q) =op(l), i.e., MI encompasses MO with probability 1.

8.2.3 Comparison of P-values

Mizon (1984) has explained the encompassing in terms of the rejection and non-rejection dichotomy, but I shall consider it in terms of the P-value. Indeed if thelimiting distribution contains no unknown parameters of M0, we can determine

(8.7)

(8.8)

Page 94: Hatanaka econometry

Discrimination as a Model Selectic Problem 81

the asymptotic P-value of an observed value of SQI- Provided that the finitesample distribution of SQI is well approximated by the limiting distribution,the P-value is an indicator to show the likelihood of an observed value of thestatistic of MI if the DGP were MO, or the degree in which behaviours of astatistic of M\ can be interpreted by MQ.

So far we have investigated how well Mo^-Mi. In comparing mutually non-

nested and separate MO and M\ we must also examine how well MI^MQand form our judgement on the basis of both examinations. The encompassing,

£

MI->MO, is analysed through a statistic, Sw, and the asymptotic P-value of S\0is Pw if the DGP is ML

In general the P-values are random variables because they are functions ofobservations. The joint p.d.f. of the bivariate random variable (Poi,Pio) issketched in Figure 8.2. The DGP is a member of M0 in Figure 8.2(a). If theasymptotic theory provides a good approximation, (Po\, PW) is distributed closeto the P0i-axis, and the marginal densities of P0i are uniform over [0, 1]. It canbe roughly said that the higher the testing power of Sw is, the faster the marginalp.d.f. of PIO declines as we move away from the origin. The DGP is a memberof MI in Figure 8.20). (P0i, ^10) is distributed close to the PIQ axis, and themarginal densities of PIO are uniform.

If the observation at hand reveals large P0i and small PIQ, it may be reasonableto judge that M0 fits the data better than MI. It is possible in small probabilitiesthat both POJ and PIO are small, no matter which of M0 and M\ may containthe DGP. If that happens we would be unable to make a comparison betweenMO and MI. If both P0i and P]0 are large, it would mean that neither M0 norMI is the DGP.

I propose that the congruence of MQ and M\ to the observations be comparedthrough POI and PIQ. MQ is judged superior to M\ if and only if PQI > PIO, and

(a) DGP is a member of M,

FIG. 8.2 Illustrations of joint distributions of P-values: (a) DGP is a member of M0,(b) DGP is a member of MI

(b) DGP is a member of Af,

Page 95: Hatanaka econometry

82 Unit-Root Tests in Univariate Analysis

vice versa. POI and PIO are determined by the limiting null distributions of SQ\and SIQ respectively.

This criterion is fair if the joint distributions in Figures 8.2(a) and (b) aresymmetric about the 45-degrees line connecting (0, 0) and (1,1) on the (P0i, Pw)plane. Or, summarizing the joint distributions compactly, the criterion is thoughtfair if the probability that POI < ^10 in Figure 8.2(a) and the probability thatPOI > PW in Figure 8.2(b) are equal. Here the fairness is judged on the basisof losses due to two kinds of wrong judgement. The former probability is thatof deciding M\ superior to MO when the DGP in fact belongs to MQ. The latterprobability is that of deciding M0 superior to M\ when the DGP belongs to MI.If the probability that PQI < -Pio exceeds the probability that POI > PIO, ourcriterion has a bias in favour of M\.

When the DGP belongs to MO the probability of correct judgement is thatof POI > PIO in me joint distribution of (Poi, PIO) in Figure 8.2(a), and theperformance of our model comparison under MO can be measured by this prob-ability. The minimum requirement on the performance is that the probability ofPOI > PIO exceeds 0.5, because 0.5 can be achieved by tossing a coin withoutever analysing relevant data.

Usually MO and M\ are each parametric families of models. The asymptotic(marginal) distribution of POI is, but the finite sample distribution is not invariantthroughout different members of M0 when the DGP belongs to M0. As for PIOeven the asymptotic (marginal) distribution is not invariant throughout differentmembers of MO, one of which is supposed to be the DGP. Therefore the finitesample joint distribution of (Poi, PIO) depends not just on which of MO and M\contains the DGP, but also on the particular parameter value of MO and/or M\that represents the DGP.

8.2.4 Various encompassing tests

We have two kinds of comparison between a non-nested pair of models inFigure 8.1 (a). The first is comparisons within DS models, for example, betweenMQS and MQ, and also within TS models, for example, between M\s and M\. Thesecond kind of comparison is between a member of DS models and a memberof TS models. The ultimate aim of our study, i.e. the discrimination betweenthe DS and TS, is the second kind.

The F-test explained in Section 7.1 is an encompassing test for Mg^-MJ.(7.1) is an AR version of M\, and the OLS estimator of (ft, ct\) in (7.2) is astatistic of Mj. Under M0 we know that (ft, a\) converges in probability to (0,

1). Therefore 0' = (ft, a\ — 1) is an encompassing statistic for M0—»M]. Beinga vector statistic its normalization is made through taking a quadratic form of(ft,«] — 1) with the inverse of a covariance matrix, and the result is what isknown as the F-statistic to test for (ft, a\) — (0, 1).

As for the analysis of MQ—>M\ we can use the F-test given in Section 7.1for the case where the deterministic trend is confined to a constant. However,

Page 96: Hatanaka econometry

Discrimination as a Model Selectic Problem 83

the modes of deterministic trends need not be identical between MO and M\.

For MQ-^MJ the encompassing may be examined by the F-test for (fi, ft, a\) =(0, 0, 1) in (7.2) because p = 1 forces /j, to zero. Similar F-tests can be used

forAfj4-M}-v, M^M[C, M04>MJCJ, M10^>M\S, M\^M\, M\^M\ etc. ThS

test statistics and their limiting distributions are shown in Hatanaka and Koto(1994) and Koto and Hatanaka (1994).

Interpretation of the MA unit-root test as an encompassing test is more compli-cated. Consider the model and notations in Section 3.4, especially the simplestcase, (3.11a) and (3.lib), where AJC, is MA(1). The entire models may beindexed by S. The stationary M^ is represented by S = 1. A class of non-stationary models having a common value of S and arbitrary values of OQ maybe represented by that value of 8. Let offi(5) be the covariance matrix of x fora given S. Then £2(1) = /. Let Oo(S) be the GLS estimator of 6>o for a givenS, and v(S) - x - 00(8)e. Then v( l ) equals v in Section 3.4. v(5)'Q(S)~1v(5)is a statistic of the class of models indexed by S. According to Saikkonen andLuukkonen (1993a) v'DD'v is the second-order derivative of v(S)'fi(S)~1v(5)with respect to S evaluated at S = 1. The above general definition of encom-passing has dealt with a distance between a statistic f$\ and its evaluation /3i(0),but we may extend the idea to the curvature.5

Extending the above reasoning to the case where Axt is an ARMA, (3.14') with

Y replaced by y can be used for the study of M^—^M^. Moreover, by subtractinga linear trend from the original series as explained in Section 6.3 the MA unit-

root test can be used for Mj-^Mj. Hatanaka and Koto (1994) shows the limiting

distributions to be used for M\S-^>M1, M\C-^M1, M?4>Mg, M?4-Mjj etc.0S

0C

The Saikkonen-Luukkonen test is useful to investigate how well a TS modelencompasses a DS model when the two share an identical mode of deterministic

trends, for example, Mj-^-Mg and M\S-+MQS. But the test cannot be used whenthe two models have different modes of deterministic trends. I shall presenta method that can be used for the encompassing from a TS model to a DSmodel in so far as the mode of deterministic trend in the TS model is moreinclusive than or identical to that in the DS model. I shall illustrate the methodby M'-^Mg1. The DGP is

where {v,} is stationary with E(vt) = 0 and E(vf) = of. The sample mean ofAx,, ft, is a statistic of MJj. Evaluated under M\ we see that plim [L = 6\, and 9\may be estimated by OLS along (8.9), which is denoted by Q\. Let (j> = jl - 9\.Then we have an identity

5 The point optimal MA unit-root test in Saikkonen and Luukkonen (1993i>) can be interpretedas an encompassing test in a simpler manner, but my work was nearly completed when it becameavailable to me.

(8.9)

(8.10

Page 97: Hatanaka econometry

and if a^ diverges at the speed of T the test would lack the consistency. Oneway to construct such c^ is as follows. I shall write below as though [vt] isobservable. In so far as the mode of deterministic trend in M\ includes thatin MQ, ( v i , . . . , VT-) can be estimated by (e\,..., CT). Fit an AR model to {v(},calculate the dominating root, p, of the characteristic polynomial, and choose asufficiently large n such that \p\n can be regarded zero, assuming that \p\ < 1.Let y' s (xo, Xi, • • • , X«+i) be the vector of autocovariances of {v,} for lags0, 1 , . . . , n + 1. The first element, xo> is of-A = (yAo, XA,, • • •, XA,,) be Let y'the vector of autocovariances of {Av,} for lags 0, 1 , . . . , « , and note that XAcan be estimated consistently from {Avr}, no matter whether the DGP is M\ orMQ. The two vectors y and XA are related through a known matrix. Let e, bethe z'th unit vector of (n + 1) elements, F\ be (n + 1) x (n + 1) and equal to2/n+i — L* — L*' — e\e'^, where L* has unity one line below the main diagonaland zeros elsewhere, and F = (F\, —en+i). Then Fy = XA- Assuming x«+i = 0and thus writing y' — ( y ' t , 0 ) , w e o b t a i n F I X * — X A , w h i c h c a n b e s o l v e d f o rthe first element of x*, i.e. of- The solution is bounded even under MJj becausethe estimate of XA is, even though the assumption that Xn+i = 0 is invalidunder MQ.

This kind of test can be performed for Mjs—*-Mg, M\c —> MQ, and moreoverextended to M,+1 ->• M0, i — 1, 2 by modifying (JL as the sample mean of A'xt.Relations analogous to (8.10) always hold. I shall call this kind of test the /t test.

8.3 Simulation Studies on the Comparison of P-values

There are at least three reasons why simulation studies are needed on the compar-ison of P-values proposed in Section 8.2.3 above.

84 Unit-Root Tests in Univariate Analysis

where (e\,..., eT) is the vector of residuals in the OLS. Under M\

If {v,} is Gaussian, the right-hand side of (8.11) is N(Q, 1a^}. If MQ were theDGP, (T - 1 )</> diverges at the speed of ^/T.

To use (8.10) for M}-H»MQ it is necessary to design an estimator of a^, whichis consistent under M\ and remains bounded in probability under MQ. This isbecause the test statistic is not (T - l)(/u. — Q\), but

(8.12)

( 8 . 1 1 )

where Since the last term is

Page 98: Hatanaka econometry

Discrimination as a Model Selectic Problem 85

1. All the encompassing tests given in Section 8.2.4 are based upon asymp-totic theories, whereas tests will be applied to the time-series data with T rangingfrom 80 to 180. The only feasible determination of P-values is to assume thatthe exact, finite sample distributions of test statistics are identical to the limitingdistributions under the null hypotheses. Discrepancies between the finite sampleand the limiting distributions are revealed as deviations of the marginal distribu-tions of POI and/or Pi0 from the uniform distributions, and the deviations in turnmight result in asymmetry of the joint p.d.f.s about the 45-degrees line betweenFigures 8.2(a) and (b).

2. Related to (1) above is a possible difference between the powers of thetest for DS against TS and the test for TS against DS, which would result inasymmetry of the joint p.d.f.s between Figures 8.2(a) and (b).

3. Earlier in Section 2.4 I said that the DS and TS hypotheses are not separate.A serious problem is how far apart the values of discriminating parametersshould be in order to differentiate the DS and TS hypotheses with T = 80 to180. The near-boundary cases between the DS and TS consist of the TS modelsthat are close to DS and DS models that are close to the TS. The former canbe represented by the dominating root of characteristic polynomials of AR forx, being close to unity. The latter can be represented by the spectral density ofAx, in the Schwert ARMA declining as one moves on the frequency domaintowards its origin.

In the model

where {st} is i.i.d. with E(e,) = 0 and E(sJ) - 1, (1) if S = p (8.13) is arandom-walk model for {*,}; (2) if 6 = 1 and \p\ < 1, (8.13) is a stationaryAR(1) with root, p; and (3) if 1 > S > p > -1, (8.13) is a Schwert ARMA forAx, defined in Section 7.2.2, generating {x,} that is DS but close to TS.

Time-series data with T — 100 are generated using each of the above

three types of DGP. The augmented Dickey-Fuller test for MjJ->M° and the

Saikkonen-Luukkonen test for M0{->M° are run for each one of the timeseries. The superscripts 0 on these Ms mean that the data are treated asthough they might contain unknown constants. The Saikkonen-Luukkonen testis implemented as described in Section 3.4.2, especially after (3.15'). Lag ordersin the AR fitting are determined by the general-to-specific principle as describedin Section 7.2.1 with 5% significance level.6 The pmax in types (1) and (2) is10, and that in (3) is 20.7 The AR representation of (8.13) in the form of (6.21)is given by

6 In fact we choose the lag order one higher than the one determined by the general-to-specificprinciple.

7 Since we are not supposed to know the DGP we should adopt the same p>max in the three typesof DGP. We intended to save the computing time as it increases with /7max.

(13)

Page 99: Hatanaka econometry

86 Unit-Root Tests in Univariate Analysis

How large lag orders are required in AR approximations depends upon \p — Sas well as S.

Compiling 5,000 replications of simulation results, Figures 8.3-8.7 showthe joint distributions of P-values of the Dickey-Fuller statistic and theSaikkonen-Luukkonen statistic. These P-values will be written PDF and PSLrespectively. The DGP in Figure 8.3 is a random walk, the case where S = pin (8.13). The joint distribution in this figure is found just as we should expectfrom the asymptotic theory. Most of the probability is concentrated in the regionin which PSL is low, indicating that the Saikkonen-Luukkonen test is powerfulagainst the random walk even when the lag order is determined from data. PDF

is distributed roughly uniformly, which means that the finite sample distributionof the Dickey-Fuller statistic with T = 100 is well approximated by its limitingdistribution even when the lag order is determined from data. The probabilitythat PDF > PSL is 0.85. Our criterion provides in high probability the correctjudgement in discriminating DS and TS.

The DGP in Figure 8.4 is a case of Schwert ARMA, S = 0.95, p = 0.928in (8.13), of which the spectral density function is graphed in Figure 7.1(a). Itis a DS model that is close to TS. The joint distribution in Figure 8.4 revealsfairly good performance of our encompassing procedure. The probability ofcorrect judgement, i.e. PDF > PSL, is 0.77, which is only slightly worse thanin the previous random walk. The Dickey-Fuller test in the mode of Saidand Dickey (1984) has its finite sample null distribution close to the asymp-totic distribution when the AR lag orders are chosen as prescribed above. TheSaikkonen-Luukkonen test has adequate power.

The DGP in Figure 8.5 is another case of Schwert ARMA, S = 0.8, p — 0in (8.13), of which the spectral density function is shown in Figure 7.l(b).This is a model investigated in Schwert (1989), and may be thought closerto TS than the previous one is. Here the probability that PDF > PSL is 0.70,indicating performance worse than in the previous Schwert ARMA. While theSaikkonen-Luukkonen test retains adequate power, the Dickey-Fuller test in the

FIG. 8.3 Joint distributions of P-values (random walk)

Page 100: Hatanaka econometry

Discrimination as a Model Selectic Problem 87

FIG. 8.4 Joint distributions off-values (Schwert ARM A, I)

FIG. 8.5 Joint distributions ofP-values (Schwert ARMA, II)

mode of Said and Dickey (1984) has a large-size distortion indicating that theprobability of 0 < PDF < 0.05 is as large as 0.30, and causing a sharp peak ofjoint probabilities of (PDF, PSL) at the origin. Nevertheless one would perhapsbe relieved by the probability of correct judgement being 0.70, because theresult in Schwert (1989) was miserable. In part this is due to our encompassing

analysis performed in both DS-^TS and TS-^DS whereas Schwert (1989) hadDS-^TS only.

The DGP in Figure 8.6 is a stationary AR(1) with p — 0.8, which is taken asa value somewhat displaced from p = 1. Most of the probability is assembledin the region in which PDF is small, indicating that the Dickey-Fuller testis powerful against p — 0.8. But the power seems to be less than that ofthe Saikkonen-Luukkonen statistic against p = 1 revealed in Figure 8.3. Themarginal distribution of PSL is somewhat distorted from being uniform over[0, 1]. There is some size distortion when the Saikkonen-Luukkonen test is

Page 101: Hatanaka econometry

88 Unit-Root Tests in Univariate Analysis

FIG. 8.6 Joint distributions off-values (stationary AR, I)

implemented as in Section 3.4.2 with a lag order determined from data. Never-theless the probability that PSL > PDF is 0.85, and there is a high probabilitythat our criterion provides the correct discrimination between DS and TS.

The DGP in Figure 8.7 is a stationary AR(1) with p = 0.9, which is closerto the random walk than the DGP in Figure 8.6. A considerable probability islocated in the region in which both PSL and PDF are low. The Dickey-Fullerstatistic does not have power against p = 0.9, and the Saikkonen-Luukkonenstatistic has a significant size distortion. The probability that PSL > PDF is only0.60, and it must be admitted that our criterion does not work. With T = 100p = 0.9 presents a TS model that is sufficiently close to DS.

Common to all five figures is the finding that the region in which both PDF

and PSL are larger than, say, 0.3 has virtually zero probability. If both P-valuesare found larger than 0.3 in our empirical studies, we should suspect modelmisspecification. The misspecification must be on some parts of the model otherthan the lag order, because the lag order has been determined from the data inall of the four figures.

FIG. 8.7 Joint distributions ofP-values (stationary AR, II)

Page 102: Hatanaka econometry

Discrimination as a Model Selectic Problem 89

On the other hand it is seen that both /"-values are low far more frequently inthe DGPs (such as Figures 8.5 and 8.7) near the boundary between DS and TSthan in the other DGPs that are definitely TS or definitely DS. Both P-valuesbeing low might be taken as indication of boundary cases.

Our criterion is fair in comparing the random walk, which is DS, and thestationary AR(1) with p = 0.8, which is TS. Two probabilities of wrong deci-sions, i.e. the probability that PSL < PDF in the former model and the probabilitythat PSL < PDF in the latter, are both 0.15. However, our criterion is not fairwhen a model near the boundary between DS and TS is one part in comparison.When the random-walk model, which is DS, is compared with the AR(1) withp = 0.9, which is TS, the probability that PSL > PDF in the former model is0.15, while the probability that PSL < PDF in the latter is 0.40. The probabilityof mistaking a TS model for DS exceeds the probability of mistaking the DSmodel for the TS. The bias is in favour of DS. Likewise there is a bias in favourof TS between a Schwert ARMA and AR(1) with p — 0.8. In short our crite-rion is fair in comparing separable models but unfair in comparing inseparablemodels, admittedly a rather obvious result.

More simulation results are found in Koto and Hatanaka (1994). It includesresults on stationary ARs such as (1 — 0.9L)(1 — 0.8L)x, = s, and (1 —0.95L)jc( — st for DGP. The probability of correct discrimination is less than0.5, that is, worse than tossing a coin. Characteristic roots between 0.9 and 1.0are not distinguishable from a unit root with T = 100,8 and this statement doesnot assume a particular unit of time.

8 Fukushige, Hatanaka, and Koto (1994) reached the same conclusion using a discriminationmethod different from the one adopted here.

Page 103: Hatanaka econometry

9

Results of the Model Selection

Approach

Based upon the Bayesian analysis to be reviewed in Chapter 10, Schotman andvan Dijk (1991ft) and Koop (1992) explicitly and Phillips (1991ft) implicitlypresent a tentative conclusion that real economic variables are trend stationaryin so far as the US historical data are concerned. The difference stationarity isalso questioned in Rudebuch (1992) using the resampling method, and in Dejonget al. (1992ft) and Kwiatkowski et al. (1992),' using what may be interpretedas a prototype encompassing method. As for the post-war annual data of realoutputs a Bayesian analysis in Schotman and van Dijk (1993) reveals that thosein eight out of sixteen OECD countries are DS, those in two countries are TS,and the rest undecided.

We shall investigate both the historical data and post-war quarterly data inthe USA. As for the historical data we analyse real GNP, real wage, real rateof interest, and unemployment rate as the real economic variables, to which areadded nominal GNP, CPI, stock price, nominal rate of interest, and nominalmoney stock. The basic parts of the data set are the Nelson-Plosser (1982)data updated by Schotman and van Dijk (1991ft) to 1988.2 All variables otherthan unemployment rate and interest rate are in logarithmic scales.3 As forthe post-war data of the USA we analyse real GDP, real consumption, realwage, real rate of interest, and unemployment rate, to which are added nominalGNP, CPI, stock price, and nominal money stock. All the post-war data arequarterly and seasonally adjusted, if required, by X-ll method.4 The nominalrate of interest is deleted from the post-war data set because of its conspicuousheteroskedasticity in AJC, and possibility that the integration order may not beconstant over time.

The following is a summary of results that are presented in Hatanaka and Koto(1994) in greater details. In all the tests we have adopted AR approximationsto ARMA models, and the lag orders have been determined as described inSection 7.2.1 by the f-tests of highest-order coefficients.

1 Kwiatkowski et al. (1992) perform a type of MA unit test as well as Dickey-Fuller test, thoughthey do not refer to the encompassing analysis. Their MA unit-root test uses a non-parametricestimator of long-run variance.

2 The real rate of interest is not in Nelson and Plosser (1982).3 See Section 2.5 on the logarithmic transformation.4 See Section 7.3.4 on the seasonal adjustment.

Page 104: Hatanaka econometry

Results of the Model Selection Approach 91

9.1 Deterministic Trends Amenable to the DS and the TS

I begin with the selection of models for deterministic trends, which is to be madeseparately within the DS models and within the TS models. A word of warningis appropriate in this regard. When your eyes move along a rough contour oftime-series charts such as Figure 9.1(a), you are tacitly assuming that the timeseries has a central line, i.e. it is trend stationary.

FIG. 9.1 Historical data of real GNP, log: (a) Original series, (b) Differenced series,(c) Deterministic trend for TS

(a) Original series

(b) Differenced series

(c) Deterministic trend for TS

Page 105: Hatanaka econometry

92 Unit-Root Tests in Univariate Analysis

Our model selection of deterministic trends is based on hypothesis testing,using equations such as (8.5) or (8.6). The results are that modes of the determin-istic trends amenable to DS are often different from those amenable to TS, andthat generally deterministic trends for DS are simpler, often lacking structuralchanges, than those for TS.5 Determination of modes of deterministic trends isoften difficult for TS.

These points may be illustrated by the historical data of real GNP. The chartis given in Figure 9.1(a). Virtually all the past studies concluded that a lineartrend with an intercept shift, Mlc, fits the data with 1930 as the breakpoint (1930is the beginning of the new regime.) Figure 9.l(b) shows the series of {Ax,},and we see no evidence of an outlier about the Great Crash or at any time.Moreover, tests with equations such as (8.6) reveal that neither the difference inmeans of Ax, before and after 1929 nor the linear (or quadratic) trend in A.X, issignificant. Thus the best model of deterministic trend within the DS hypothesesis a simple, linear trend. On the other hand the test with equation (8.5) for TSmodels shows that the intercept shift is indeed significant but the slope changeis not. Thus M\c with breakpoint at 1930 is the best model of deterministic trendwithin the TS hypothesis (see Figure 9.1(c)).

As stated at the beginning of Chapter 8 structural changes in deterministictrends are exogenous to the stochastic parts of the variables. There may bedoubts about the Great Crash being regarded as exogenous to the stochastic partof real GNP. Such interpretation is necessitated if the stochastic part is to bestationary, but it is not if the stochastic part is non-stationary with a unit root.

In general, different models tend to explain different aspects of observations,but the models that explain relatively a wider range of aspects (i.e. wider endo-geneity) are thought superior to other models. Moreover, models with constantparameters are also superior to models with time-varying parameters. In termsof these criteria the DS models are superior to the TS models.

Concerning the historical data other than real GNP, the real and nominal ratesof interest and stock price present some difficulties in determining modes ofdeterministic trends especially for the TS models. The difficulties are somewhatreduced in the post-war quarterly data.

9.2 Discrimination between TS and DS

The model selection by the encompassing principle compares different modelsin terms of their congruence with statistical observations. What seems to be theideal procedure to discriminate DS and TS in the array, Figure 8.1 (a), consistsof the following steps: (i) assuming that the DGP is DS, choose the best modelof deterministic trend in DS; (ii) assuming that the DGP is TS, choose the bestmodel of deterministic trend in TS; (iii) investigate how well the DS modelchosen in (i) encompasses the TS with the most general model of deterministic

s One exception is real rate of interest in the historical data.

Page 106: Hatanaka econometry

with /(•) denoting the limiting p.d.f. of the statistic.

Results of the Model Selection Approach 93

trends; and (iv) see how well the TS model chosen in (ii) encompasses the DSwith the most general model of deterministic trends. The ultimate discriminationbetween DS and TS is made by comparing the encompassing capability in (iii)and (iv) measured by the /-"-values.

This procedure cannot be adopted because I am unable to find a method thatworks well with T — 100 or so for step (iv), in which the model of deterministitrend in DS subsumes that in TS. I shall follow steps (i) and (ii) but then proceedto (iii') in which we see how well the DS model chosen in (i) encompasses theTS model chosen in (ii), and finally (iv') to see how well the TS model chosenin (ii) encompasses the DS model in (i). This procedure has a drawback inthat selection of the best model of deterministic trends in (i) does not meanmuch when the DGP is TS. This selection is not used in (iv) but used in (iv').Nevertheless step (iv') is feasible because we have not encountered the bestmodel of deterministic trends in DS properly subsuming that in TS as explainedin Section 9.1 above. Incidentally, step (iii) is feasible, but the symmetricaltreatment of DS and TS necessitates modification of (iii) into (iii').

9.2.1 US historical data

In the historical data real GNP and real wage may be near the boundary betweenDS and TS; real rate of interest and unemployment rate are judged TS; CPI is7(1) but not 1(2); and stock price, interest rate, and money stock are judged DS.These are the conclusions reached in Hatanaka and Koto (1994). Here I shallpresent the analysis on real GNP, 1909-88. It was found in the previous sectionthat the best model of deterministic trend is a linear trend for DS while it is alinear trend with an intercept shift in 1930 for TS. Thus we shall perform ourencompassing analysis between M\ and M\c.

Let us begin with M\-^-M\c. This is investigated using a type of the augmentedDickey-Fuller test. Consider

with ITB+I(I) defined below (8.1). If a\ < 1 (9.1) defines a linear trend withan intercept shift at 1930, i.e. M\c'. If (fn, 0, «i) = (0, 0, 1), (9.1) representsa DS model with a linear trend, i.e. M0. The F-test is run for (/u-i, ft, «i) =(0, 0, 1). In Koto and Hatanaka (1994) the limiting distribution of the test statisticunder M0 is tabulated by a simulation method described in Section 7.3.1. It isassumed that (T — TB)/T = A is kept fixed while T goes to infinity. Since(1988-1929)7(1988-1908) is close to 0.7, the table for A = 0.7 is reproducedin Table 9.1, where p and x represent those in

(9.1)

Page 107: Hatanaka econometry

p 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95

x

To implement the f-test we first determine lag order (p - 1) in (9.1), whichis the order of AR other than (1 - pL) as shown, for example, in (7.1) and (7.2).In the present data the highest lag order term, ap, has t value equal to 2.04 with(p — 1) = 7. The F-test statistics for (p — 1) = 7 to 10 range between 7.0 and9.0. Table 9.1 shows that their P-values are much less than 0.05.

As for M\C—>-MQ, we can use the jl test. It was explained in (8.10) and (8.11)for M}-»Af (j, and the method can easily be extended to M\c—>-Ml0. The jl is stillthe sample mean of AJC(, which is a statistic of MQ. Since M\° has deterministictrend, #0 + #i/i93o(0 + #2'. it is seen that plim jl = 92, which is estimated byregressing xt upon (1, /i93o(0> 0- Analogously to (8.10) we have

where (e\,...,eT) are the residuals. It is again necessary that <r^ remainsbounded under M\. In the present case n = 20 is found adequate in estimating<jy by the method described below (8.12). The limiting distribution of the /a-teststatistic is N(0, 1), and the P-value is about 0.2 in the present case.

The P-value in M\C->M1Q is larger than the P-value in M0->M[C, and wedecide that M\c, a member of TS models, fits the data better than MQ, a memberof DS models. However, it is better to keep in mind that both of the two P-valuesare small, which may be indicative of a boundary case between DS and TS.6

As for the other real variables the real wage has the same deterministic trendsthat the real GNP has respectively in the DS and TS models. The P-value for DSis slightly larger than that for the TS, but both of the P-values are as low as 0.1 or0.2. The real rate of interest has a constant trend in both the DS and TS models,and also an outlier in the differenced series. The encompassing analysis with theoutlier included in the data shows P-value for TS overwhelmingly larger thanthat for DS, but the difference is reduced when the outlier is eliminated by adummy variable introduced in both the DS and TS models. The unemploymentrate has a constant trend in both the DS and TS models, and the P-value for TSis about 0.7 while that for TS is below 0.05. These results on the historical datamay be summarized as follows. Unemployment rate and real rate of interest aredefinitely TS. Formal applications of our comparison criteria lead to real GNPbeing TS, but this variable as well as real wage appears to be near the boundarybetween DS and TS.

6 Simulation results in Koto and Hatanaka (1994) lead us to think that the innovation variancein the pre-war period being four times as large as that in the post-war would not bring about anybias in either direction.

94 Unit-Root Tests in Univariate Analysis

TABLE 9.1 Limiting cumulative distribution of the statistic for Mg—*M\C, A. = 0.7

1.45 1.79 2.08 2.26 2.66 2.98 3.35 3.83 4.56 5.23

Page 108: Hatanaka econometry

Results of the Model Selection Approach 95

We conclude that the data do not provide strong evidence to propose theunit roots to be incorporated in macrodynamics of real economic variables. Theself-restoring force of the trend line can be neither assured nor refuted.

9.2.2 US Post-war data

All variables, real variables, prices, and financial variables, are DS in the post-war period. This is the conclusion reached in Hatanaka and Koto (1994).

Below I shall show the analyses of real consumption, unemployment rate,and CPI. As for the CPI we shall be concerned with whether it is 7(2) or 7(1).

The data of real consumption for 19491-1992IV are shown in Figure 9.2(a).To select the deterministic trend within DS models we initially look for outliersin Ax,, which is shown in Figure 9.2(b). We have decided that there is none.Then (8.6) is used to see if the difference in means between the periods beforeand after the Oil Crisis is significant, and we have found it barely is. Thuswe have chosen a linear trend with a slope change at 1973 I, M0

J. As for thedeterministic trend within the TS models the analysis on the basis of (8.5) showsthat a slope change at 1973 I is significant. Thus we shall compare M^9 and M\s

in the encompassing analysis.

As for MQV—>M\s we use a type of Dickey-Fuller test based on (8.4), ignoring|, therein. The T-'-test is run for (/Jo, P\,a\) = (0, 0, 1). The limiting distributionof the test statistic under MQV is tabulated in Koto and Hatanaka (1994). Notingthat X — (T ~ TB)/T is about 0.5, we cite the distribution for X = 0.5 inTable 9.2.

The coefficient of the highest lag order term, ap in (8.4), has ^-values equal to-1.96 and -2.23 for (p - 1) = 23 and 20 respectively. With (p - 1) = 23 thF-statistic is about 2.5. Comparing this with Table 9.2 we see that the P-valueis about 0.8.

M\S~>MQS is investigated by the Saikkonen-Luukkonen test. The linear trendwith slope change is initially subtracted from xt. The asymptotic distribution ofthe Saikkonen-Luukkonen test statistic depends on what mode of deterministictrend is subtracted. Koto and Hatanaka (1994) show the distribution in thepresent case (linear trend with slope change, A — 0.5) as in Table 9.3.The AR fitting is adopted as described in Section 3.4.2 to construct theSaikkonen-Luukkonen test statistic (3.14'), and p = 24 is chosen for the ARlag order. It turns out that the test statistic is 0.059 in the present data. TheP-value is less than 0.2.

The P-value for M1-^M\S is much larger than the P-value for M\S-S>M1S0

and we conclude that the real consumption is DS in the post-war data.I like to call readers' attention to very high lag orders that we may have to

choose in analysing the seasonally adjusted quarterly data.The data of unemployment rate for 19491-1991IV is shown in Figure 9.3(a).

The unemployment rate is the only variable that was judged TS in the analyses

Page 109: Hatanaka econometry

96 Unit-Root Tests in Univariate Analysis

TABLE 9.2 Limiting cumulative distribution of the statistic for M^-M\S, A. = 0.51

p 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95

x

of historical data by Nelson and Plosser (1982). In Figure 9.3(b) the mean ofA.xt is not significantly different from zero in the analysis using an equation that

FIG. 9.2 Post-war data of real consumption, log: (a) Original series, (b) Differencedseries, (c) Deterministic trend

(a) Original series

(b) Differenced series

(c) Deterministic trend

2.09 2.51 2.85 3.18 3.50 3.86 4.28 4.82 5.65 6.42

Page 110: Hatanaka econometry

Results of the Model Selection Approach 97

FIG. 9.3 Post-war data of unemployment rate: (a) Original series, (b) Differenced series,(c) Deterministic trend

TABLE 9.3 Limiting cumulative distribution of the statistic for M\S^-M1, A = 0.5

p 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95

x

resembles (8.6) but lacks ITB+I- In Figure 9.3(c) the coefficient of time variableis not significant in an equation that lacks ITB+\ and max(f — TB, 0) in (8.5)

(a) Original series

(b) Differenced series

(c) Deterministic trend

0.021 0.025 0.029 0.032 0.037 0.041 0.047 0.055 0.069 0.084

Page 111: Hatanaka econometry

98 Unit-Root Tests in Univariate Analysis

but resembles it otherwise. Thus the deterministic trends are just a constant in

both the DS and TS. As for M(j4-M? we find /-value equal to -2.99 for thehighest lag order term for (p — 1) = 12. The F-statistics in the augmentedDickey-Fuller test range from 1.5 to 2.1 over ( / ? — ! ) between 13 and 16. Thelimiting distribution is given in Table 9.4. The P-values are roughly between 0.4

and 0.6. As for M\—>MJj the limiting distribution of the Saikkonen-Luukkonenstatistic, (3.14'), is given in Table 9.5 for the case where a constant trend issubtracted from xt. In the present data the statistic is 0.50 with p = 13, and theP -value is less than 0.05. We thus conclude that the unemployment rate is DSin the post-war data.

As for some of the other variables the real GDP has P-value between 0.3and 0.6 for DS, depending on what lag orders are chosen, and P-value between0.05 and 0.10 for TS. There is some anxiety about the selection of deterministictrends in real wage, but between MQ and M\ the P-value for DS is about 0.5while the P-value for TS is less than 0.05. The real rate of interest is difficult toanalyse primarily in choosing the mode of deterministic trend for TS, anothermanifestation of the point emphasized in Section 9.1 above. We have abandonedthe encompassing analysis of this variable. As for the nominal GNP, we foundcubic trends significant in both the DS and TS. The P-values for DS are between0.2 and 0.7, while the P-values for TS are much less than 0.05. As for the stockprice we have compared M\ and M\. The /"-value for DS is about 0.9, and thatfor TS is much less than 0.05. The conclusion is that real variables, price, andfinancial variables are all DS in the US post-war data except for some that weare unable to analyse.

It has also been found that none of the post-war data has both P01 and P10

small, which means that none of them seems to be near the boundary betweenDS and TS. In particular none belongs to the Schwert ARMA. This should becontrasted to the historical data, in which real GNP and real wage may be judgednear the boundary between TS and DS.

Earlier on the historical data we were unable to decide if the real economicvariables were DS except for unemployment rate and real rate of interest.Regarding the different results between the historical and post-war data there

TABLE 9.5 Limiting cumulative distribution of the statistic for M°^M°

p 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95x

TABLE 9.4 Limiting cumulative distribution of the statistic for M°— >M°

p 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95

x 0.68 0.95 1.19 1.45 1.73 2.04 2.42 2.93 3.79 4.58

0.046 0.062 0.079 0.097 0.118 0.146 0.184 0.240 0.344 0.459

Page 112: Hatanaka econometry

Results of the Model Selection Approach 99

cannot be an unequivocal explanation because the pre-war data alone are tooshort for a separate analysis. However, several conjectures are possible. First,economic regimes are different between the pre-war and post-war periods.Second, historical data are annual and cover longer time-spans, whereas post-war data are quarterly spanning shorter periods. If the DGP is TS, and if onequarter is short enough to assure b(\) — 0, then annual data covering longertime-spans would have higher discriminatory power to discern DS and TS (seeSection 2.5)). It would lead to the interpretation that real variables are indeedTS but the post-war data cannot reveal it. The dominating characteristic rootis about 0.95 on the quarterly basis, which is equivalent to 0.81 on the annualbasis. The root, 0.95, cannot be distinguished from unity with T — 150, but0.81 can be with T = 100.

At any rate it is the post-war data rather than the historical data that areused for most of econometric studies. There we must use statistical inferenceprocedures that would be appropriate when unit roots may possibly be involved.Part II will show that the treatment of even a simple regression model shouldbe different from the traditional econometrics when unit roots are involved.However, it is also important to bear in mind the uncertainty about the charac-teristic roots between 0.9 and 1.0 mentioned above.

9.2.3 Integration order of the post-war CPI

The integration order of log of CPI has been an issue in the unit-root field. Noone has questioned that it is at least unity. The issue is whether it is 2 or not. Asfor the historical data Hatanaka and Koto (1994) find strong evidence to supportthat CPI is not 7(2).

Let us turn to the integration order of log of CPI in the post-war period. Thesame analysis as before is now applied to {Axt} rather than [x,] in order todetermine whether {x,} is 7(2) or 7(1). Initially the chart of {A2*,} starting at19491 has been examined, and an outlier is found in 19511. Since interpretationof this outlier seems to pose a problem, and since 19511 is so close to thebeginning of the post-war era, we have chosen 19521-1992IV as our sampleperiod, for which charts are given in Figure 9.4.

As for the deterministic trend in DS a linear trend is not significant, but aconstant term is significantly non-zero in (8.6). As for TS a quadratic trend issignificant in (8.5). We have compared MQ and M2. The coefficient of highestlag order term for (p - 1) = 7 is highly significant. The Dickey-Fuller statisticsare not stable as lag orders are varied, and, when compared with Table 9.6,P-values range between 0.4 and 0.7 over (p - 1) = 9 to 14. The /t test has P-value, 0.4. As a supplement we have also compared M2, and M2. The P-values ofDickey-Fuller statistics are similar to the above, and the Saikkonen-Luukkonenstatistic has P-value, 0.2. The inflation rate may well be DS, and CPI may be7(2) in the post-war period.

Page 113: Hatanaka econometry

100

FIG. 9.4 Post-war data of inflation rate: (a) Original series, (b) Differenced series,(c) Deterministic trend

pX

0.1

1.63

0.2

2.00

0.3

2.32

0.4

2.62

0.5

2.93

0.6 0

3.28 3,

.7

.67

0.8

4.17

0.9

4.95

0.95

5.67

TABLE 9.6 Limiting cumulative distribution of the statistic for MQ-+M*

As for money stock, its historical data show that log M is /(I) but not 1(2).As for the post-war data we find it difficult to decide whether it is 7(2) or 7(1).

Unit-Root Tests in univariate Analysis

(a) Original series

(b) Differenced series

(c) Deterministic trend

Page 114: Hatanaka econometry

Results of the Model Selection Approach 101

Incidentally, Japanese real GNP and stock price in Figures 1.1 and 1.2 areboth 7(1), if the breakpoint is set in 1974,7 but there is some uncertainty aboutthe choice of breakpoint (see Takeuchi, 1991).

7 The highest-order coefficient in Dickey-Fuller equation is significant at (p — 1) = 11, and theP- values of the Dickey-Fuller statistic are based upon lag orders (p — 1) larger than 11.

Page 115: Hatanaka econometry

10

Bayesian Discrimination

The present Chapter surveys and assesses Bayesian studies of the unit root.Section 10.1 sets forth the notations. Section 10.2 presents three different cate-gories of discrimination that have emerged in the Bayesian unit-root literature.Section 10.3 surveys the literature on the first category, in which stationarity isdiscriminated from non-stationarity. It is not directly related to the discrimina-tion between DS and TS, but it is this category that both sides of the controversybetween Sims (1988) and Phillips (19916) refer to. It will be revealed that theprior should be model-dependent and hence a flat prior is ruled out, and that theBayesian inference is standard in so far as the likelihood function is concerned,which should be contrasted to the classical inference equipped with a battery ofnon-standard limiting distributions. Section 10.4 explains problems that arise inthe second and third categories of discrimination, in which a point-null hypo-thesis, p = 1, is compared with alternative hypotheses. It will be shown that thepoint-null hypothesis constrains our choice of prior distributions, virtually rulingout non-informative priors. The third category, which contains only Schotmanand van Dijk (1991a, b, 1993), does deal with the discrimination between the DSand TS. The results are surveyed in Section 10.5. My survey does not include therecent literature on the Bayesian model selection such as Koop and Steel (1994),Phillips (19926), Phillips and Ploberger (1992), Stock (1992), and Tsurumi andWago (1993) due to lack of time.

10.1 Differences between the Bayesian andthe Classical Theories

I assume some acquaintance with an elementary part of the Bayesian statisticson the reader's part, but some essential points will be mentioned here to comparethem with the classical statistics. Differences between the two schools of statis-tics begin with the concept of probability. In the classical school the probabilityis the relative frequency of an event in a hypothetical sequence of experiments.In the Bayesian, the probability is the degree in which an event is thought likelyto occur, or a proposition likely to hold, on the basis of someone's judgement.The Bayesian inference starts with a prior p.d.f., /(#), of a parameter vector, 0,whereas the classical inference cannot incorporate such concept. The likelihoodfunction may be written f ( x \ 9 ) in both schools, but here too is revealed an

Page 116: Hatanaka econometry

Bayesian Discrimination 103

important difference. The x in the Bayesian likelihood is the actual observation.1

The x in the classical likelihood is a random variable. Not only the actualobservation of x but also all the x that might be observed are brought togetherto form a distribution of x in the sense of relative frequency. Therefore f(x\ff)is a stochastic function of 6, from which are derived probability distributions ofthe first- and the second-order derivatives of the likelihood function, which inturn form the basis of the classical inference. On the other hand the Bayesianinference is based on the posterior p.d.f.,

Here again x is the actual observation.The basic ingredients in the Bayesian inference are the prior distribution and

the likelihood function. It will be seen later that it is the prior distribution thatraises a number of difficult problems in the unit-root field.

Suppose that we wish to test Ht, i = 0,1, where //, states that 8e ©,. Priorprobabilities of Ht are

which may be further distributed each within the ©, as Jtif^(9), i = 0, 1, where

Posterior probabilities of the two hypotheses are

where

Both JT; and «(©/), i = 0, 1 are probabilities of //, being true, and the trans-formation from it i to a(@;) is brought about by assessing the observation, x, inlight of the model and the observation.

The Bayesian inference often uses a p.d.f. such that its integral diverges toinfinity as the domain of integration expands infinitely. Such a p.d.f. is said tobe improper. When an improper p.d.f., f ( 9 ) , is adopted as a prior distributionfor all 9 in ©Q and S\, the prior probabilities of HQ and HI cannot be defined.

1 This is a main point in the likelihood principle, which is often cited in the literature of Bayesiantests of difference stationarity. See e.g. Lindley (1965: 58-69), Berger (1985: 28-33), and Poirier(1988) for more about the likelihood principle.

(1.01)

(10.2)

(10.3)

Page 117: Hatanaka econometry

104 Unit-Root Tests in Univariate Analysis

But the ratio between the posterior probabilities, i.e. the posterior odds ratio,

can be denned in so far as the integrals in the nominator and the denominatorare each bounded.

Finally, suppose that for / = 0, 1 the loss in accepting //, is zero if //, istrue, and kt if Hj(j ^ i) is true. The expected loss involved in accepting H\,i.e. rejecting H0 is &i«(©o)> and the expected loss in accepting HQ, i.e. rejectingHI is &oa(©i). Therefore H\ is accepted when

10.2 Different Versions of the Hypotheses

For the sake of simplicity let us consider the model (5.2), where \p\ < 1 in thetrend stationarity, and p = 1 in the difference stationarity. Earlier in Section 2.3it was pointed out that explosive roots of the characteristic polynomial cannot beabsorbed in the difference-stationary model. The explosive root has been a prioriruled out in all the classical inference on the unit root. Along with Sims (1988)and Phillips (199la), however, explosive roots are brought into consideration inthe Bayesian testing. Three versions of hypothesis-testing have emerged in theliterature.

I have several comments: (i) In the first version the difference stationarity ismerged with explosive roots in H0 . This version of tests does not answerquestions regarding the difference stationarity. Rather it is directed to discrim-inating the stationarity and the non-stationarity. (ii) In the second version H^is precisely the difference stationarity, but p > 1 and \p\ < 1 are groupedtogether in H\ . These two situations entail entirely diferent meanings anddifferent methods to cope with, (iii) In the third version the prior probabilityfor p > 1 is zero. Explosive roots are a priori ruled out. (iv) There can be anargument that explosive roots may very well be reality, in which case the thirdversion is rendered meaningless; in fact the discrimination between DS and TS

The two hypotheses are treated symmetrically here, whereas it is well knownthat the null and the alternative hypotheses in the classical inference are not.

Readers who wish to know more about the Bayesian inference are referredto Lindley (1965), Zellner (1971), Learner (1978), and Berger (1985).

(10..4)

(10.05)

Page 118: Hatanaka econometry

Bayesian Discrimination 105

is meaningless. This argument would lead to the fourth version, in which threehypotheses,

Writing

where ft is the OLS estimate of ft, (10.8) is expressed as

Note that y3 is not involved in either A or B. When (10.8') is normalized so asto integrate to unity in the integration by fl, (10.8') is the density functionof a normal variate with mean /3 and variance A"1, which is sketched inFigure lO.l(a).

are compared in terms of their posterior probabilities. It has not been investigatedto the best of my knowledge.

There may be some scepticism about compounding a linear deterministic trendwith an explosive p, though as t —> oo it is just an explosive trend. One mightconsider the following model of two regimes,

The first regime is nothing but (5.4), but p — 1 is excluded from it becausep = 1 forces ft to zero, merging it to (10.6&). (10.6a) may be extended toinclude a deterministic trend more general than a linear.

10.3 Problems Associated with the First Version

In the classical test of a null hypothesis HO the P-value is the probability thata test statistic is farther away from ©o beyond its observed value. In the firstversion of discrimination in Section 10.2 it is seen (i) that the whole parameterspace is divided between @o and ©i, and (ii) that ©, lies on the 'other side' of&j(i ^ j). In such cases the P-value in the classical inference is identical tothe Bayesian posterior probability in so far as we are concerned with standardmodels in which data are generated independently and identically.

For example, consider a regression model with only one fixed regressor,

(10.7

where {e,} is i.i.d., e, is N(0, of), and of is known. The likelihood function is

(10.8)

(10.9)

(10.8')

(10.6a)

(c) Deterministic trend(10.6b)

Page 119: Hatanaka econometry

106 Unit-Root Tests in Univariate Analysis

Suppose that H0 states ft > 1, which is ®0, and H\ states ft < I , which is ©i.Assume that HO = n\, the prior distribution is flat, and ko = k\ in the notationsgiven in Section 10.1 above. The Bayesian decision, (10.5), is then reduced tothe comparison between two shaded areas over @o and ©i in Figure 10.1(a). HQis rejected if the area over ©o is smaller than that over ©i as in Figure 10.1 (a).

In the classical hypothesis test of HO using ft as the test statistic we considerthe sampling distribution of ft under the assumption that the true value of ftis unity. This sampling distribution is JV^A"1) and represented by a dottedcurve in Figure lO.l(^). The P-value is the shaded area. Since the likelihoodfunction in Figure 10.1 (a) and the sampling distribution in Figure 10.1(£>) sharethe common variance, A"1, the posterior probability of 00 in Figure 10.1 (a)is equal to the P-value in Figure 10.1(b). The equality would hold even if weswitch the null hypothesis from HQ to H\.

FIG. 10.1 Bayesian posterior distribution and Sampling distribution: (a) Bayesianposterior distribution for ft or p, (b) Sampling distribution for fi, (c) Sampling distribution

for p

(a) Bayesian posterior distribution for (b) Sampling distribution for

(c) Sampling distribution for

Page 120: Hatanaka econometry

Bayesian Discrimination 107

The likelihood ratio test is another version of the classical inference basedupon the likelihood function. In testing the null hypothesis HQ it compares themaxima of likelihoods over ©o and over @i. The former maximum is attainedat ft = 1, and the latter at ft. Therefore (-2) x (the logarithm of the likelihoodratio) is A(l — /J)2. This term is assessed by its sampling distribution underH0, which is X2(l). The P-value should be identical to the one obtained inFigure 10.1(i>) through /J ~ N(l, A"1), and it is equal to the posterior probabilityof H0.

This kind of relationship between the classical and the Bayesian inferenceholds in many models in so far as the data generation is standard.2 Sims andUhlig (1991) find that the relation does not hold in the first version of discrim-ination in Section 10.2.

To simplify our exposition let us consider an AR(1)

where e, is distributed in N(0, a2) and er2 is known. Conditionally upon XQ thep.d.f. of ( j c i , . . . , XT) is

(10.10)

(10.11)

no matter whether p < or = or > 1. If T is large the marginal p.d.f. of XQ canbe ignored even for p > I.3 (10.11) can be rewritten as

(10.11')

where p is the OLS and

(10.12)

(10.12) is analogous to (10.9). The likelihood function of p is (10.11') suitablynormalized. (10.11') is quite analogous to (10.8'). Continuing to assume the flatprior and ;TO = n\ the Bayesian posterior distribution for p can be expressedby Figure 10.1 (a). The posterior probability of p > 1 is the area shaded as ///.The Bayesian analysis of H0 vs. H\ does not differ much from the regressionmodel.

2 The equality was pointed out in Pratt (1965), and elaborated on in DeGroot (1973) and Casellaand Berger (1987). In establishing the asymptotic equality in a general model the following theoremdue to Lindley (1965: pp. 129-31) and Walker (1969) is useful. If T is sufficiently large and ifthe prior distribution is continuous, the posterior distribution is approximately a normal distributionwith its mean vector equal to the maximum-likelihood estimator and its covariance matrix equal tothe inverse of the second-order derivative of the log-likelihood function.

3 This can be shown by comparing the log-likelihood conditional upon XQ and the log-likelihoodfor the marginal distribution of (x\, ..., XT) after xo is integrated out (see Fukushige, Hatanaka, andKoto (1994)).

Page 121: Hatanaka econometry

108 Unit-Root Tests in Univariate Analysis

Let us turn to the classical inference and consider the likelihood ratio test ofHQ(P > 1) against H\(0 < p < 1). Suppose that p < 1 happens to hold in thedata. The test statistic is A (p - I)2, which is evaluated in terms of the samplingdistribution of p assuming that the true value of p is unity. The asymptoticdistribution is not X2(l), but

The asymptotic P-value of p can be obtained. Alternatively one could use simplyT(p — 1) as the test statistic. The sampling distribution is sketched as the dottedcurve in Figure 10.1(c). It is non-normal and skewed to the left of p — 1. The P-value is the shaded area, and it is not equal to the Bayesian posterior probabilityof p > 1 in Figure 10.1 (a).

When a unit root is involved the likelihood function is standard but thesampling distribution is not, and as a result the P-value and the posteriorprobability diverge. Sims and Uhlig (1991) demonstrate this point visuallythrough a simulation study with T = 100.

I have assumed a flat prior of p so that the posterior has been proportionalto the likelihood function. Phillips (I99lb) criticizes a flat prior as a non-informative prior. He stresses that the prior should incorporate the propertyof the model as it is not meant to be prior to the model (but only to observa-tions). In the standard regression model, yi = /£k, + e,, the Fisher informationdoes not involve /3, and the flat prior of ft may be regarded as non-informative.In the autoregressive model (10.10) the Fisher information does involve p, andthe flat prior of p is not non-informative. The Jeffreys non-informative prioris proportional to the square root of (the determinant of) the Fisher informa-tion (matrix). The best known example is cr~l for a. Its advantage is that theresult of Bayesian inference is invariant through transformations of parametersif this prior is adopted (see Zellner (1971: 47-8)). The point is relevant if onewishes to keep the results comparable between annual and quarterly data. SeeSection 2.5.

As for the model (10.6a), in which \p\ < 1, the Jeffreys prior density for(/n, ft, p, a) is approximately

(10.13)

However, for p > 1 the density rises at the speed of p2T, which would becontrary to many researchers' judgement. See Learner (1991), Poirier (1991),Sims (1991), Wago and Tsurumi (1991), and Phillips (1991c). In my view theprior should be model-dependent, and hence its density should vary with p, butno economists would accept the Jeffreys prior.

DeJong and Whiteman (1991a, 1991i>) perform a kind of the first version oftesting, in which the discrimination is sought between H^ : p > 0.975 andH\ : p < 0.975. Their losses, &o and k\, are such that HQ is accepted if andonly if P[p > 0.975] > 0.05. The model is (10.6a) extended to p > 1 as well as

Page 122: Hatanaka econometry

Bayesian Discrimination 109

\p\ < 1, while leaving /z, /3, and p variation free. Compare this with Section 5.2.In one prior H\ is rejected only in CPI, bond yield, and velocity among thefourteen time series in Nelson and Plosser (1982). Some alternative priors addto this list nominal GNP, nominal stock price, and nominal wage. They observea high negative correlation between ft and p, but this may be explained by mytheoretical observation given in Appendix 3.3.

I conclude the present Section as follows. The Bayesian discriminationbetween H(^ and H^ is standard in so far as the likelihood function isconcerned. In particular it does not require non-standard distributions suchas those based on the Wiener process. However, the Bayesian discriminationbetween H^ and H^ is non-standard in requiring the prior distribution to bemodel-dependent and thus ruling out a flat prior for p. The latter point holdseven when the domain of p is confined into (—1,1] as in the third type ofdiscrimination.

10.4 Point Null Hypotheses

The hypotheses, H0 , H0 , and HQ , each restrict p to unity. The domains parameters (p and all nuisance parameters) specified by these hypotheses haveeach zero measure in the whole space of parameters. Such hypotheses are calledthe point (or sharp) null hypothesis.

The Bayesians begin their argument with the remark that many point nullhypotheses are not what researchers really wish to test. Concerning the differencestationarity the argument is that we really wish to test pe [1 — S, 1 + S] ratherthan p = I . Then, turning to the case where researchers really wish to test apoint-null hypothesis, the Bayesians such as Berger (1985: 156) admit that theirinference does not work. In particular, no matter what TTQ and TC\ may be, a(©o)defined by (10.3) for a point null hypothesis HQ : 9 = OQ is approximately unity,when T is large and the distance between 6$ and the mode of the likelihoodfunction is in the order of T~1/2. This is called Jeffreys paradox or Lindleyparadox.4 (Recall that the local alternative in the classical hypothesis testingdeviates from $o by the order of T"1/2, and yet the test statistic rejects thenull hypothesis when the P-value is small.) Readers may be relieved by thefact that long time-series data are not available to us. With small T, however,the Bayesian inference still faces two problems: (i) restrictions to our choice ofprior distributions and (ii) an excessive favour on the point null hypothesis.

10.4.1 Restrictions to priors

I begin with the first problem, restrictions to prior distributions. Despite theemphasis upon the subjective probability many Bayesians in econometrics

4 The statement follows from the reasoning in Berger (1985: pp. 148-56) and Zellner(1971: 302-6), assuming that data maintain the same P-value while T —>• oo.

Page 123: Hatanaka econometry

110 Unit-Root Tests in Univariate Analysis

and other disciplines perform an objective analysis by adopting some non-informative prior and/or by checking the sensitivity of posterior distributions todifferent prior distributions. The latter should be recommended in every study,but it is often the case that non-informative priors cannot be adopted in testinga point null hypothesis.

Suppose that a model contains only a scalar parameter, 9, and that Ho statesthat 0 is a point OQ while H\ states that 0 ^ 9$. Let us consider the posteriorodds ratio (10.4) with a continuous density function f ( 9 ) as a prior. Then theprior probability of 9 — #0 is zero, which forces the posterior probability tozero. No matter whether /(•) is proper or improper, a continuous p.d.f. cannotbe adopted as a prior.

Consider then a narrow interval (null) hypothesis Hs that specifies 9 to be in&s = [#o — S, 9o + S]. In the framework of (10.1) and (10.2) the prior probabilityof HS is TTo, which is distributed within 0j according to /^0)(0- If /^G) isuniform in Q$, the posterior probability of H$ is

As S -» 0 HS converges to the point null hypothesis //o, and we have

(10.14)

by virtue of the mean value theorem. In testing HO we should place a discreteprobability mass n0 on the point 60 in the prior distribution. We would then getthe posterior probability (10.14) by combining TTQ with the likelihood at 9 = 6g.It should also be clear that there is not much difference between a point (p = 1)and a narrow interval (I -8 < p < 1+5) hypothesis in so far as the likelihoodfunction is smooth about 9o. This would be the case unless T is very large sothat most of the likelihood is concentrated about 9o.5

It is worth noting that the discrete mass in the prior induces a discrete masson 9o in the posterior probability distribution.

Continuing the explanation on the scalar parameter, we place on H i the priorprobability TC\ = 1 — TTQ, which is distributed through f(-l\9) over the wholestraight line except the point 6*0. The function /^(-) must integrate to unity asspecified in (10.2) because otherwise the prior probability of HI cannot be madeequal to n\. Thus any improper p.d.f. is ruled out for /(1)(-)- Unfortunately mostof known non-informative priors are improper.

Let us then consider a model which contains a vector parameter 6 =(9i,... ,9k). As a simple example of ©o having zero measure, suppose that9\ is restricted to 6\ = 1 but (92, ..., 9k) is free in ©Q. The Os not in ©o form©i. Suppose that 0\ and ( 6 * 2 , . . . , 9k) are independent in the prior distributionso that the p.d.f. is nif(9i)f(92, ...,9k)m ©i and P[e: = 1 and (62, ..., 9k)ean infinitesimal cube about (92,.. .,9k)] = n0f(92,..., 9k)d92 ... d9k. Then

5 See Berger and Delampady (1987) for more about this point.

Page 124: Hatanaka econometry

Bayesian Discrimination 111

f(92, • • • ,0k) need not be proper because the posterior odds ratio (10.4) in thepresent case is

(10.15)

The outer integral in the denominator of (10.15) is equal to that which excludes9\ = 1 because all functions in the integrand are continuous.6

10.4.2 Favour on the point null

I now turn to the second problem, an excessive favour on the point null hypo-thesis. Consider a model that contains only a scalar parameter 9. HQ is 9 — OQand H\ is 0 / OQ. Suppose that we are given data such that the likelihood at dis low compared with the highest value of the likelihood function. The P-valuewould be small no matter what test statistic is used in the classical inference. Butin the Bayesian approach a discrete mass of prior probability is placed on 90, andit would produce a large value of a(90), which may be regarded as undue favouron the point null hypothesis by the classical inference. If the hypothesis is nota point null, effects of the prior upon the posterior distributions eventually diedown as T —>• oo no matter whether the prior is informative or non-informative.When the hypothesis is a point null, effects of the discrete mass in the priorstay on while T —>• oo, as the Lindley paradox indicates.

Pure Bayesians reject outright the idea of comparing the a($o) to the P-value,but some Bayesians suggest how to protect researchers from an undue favour ona point null hypothesis.7 In regard to the test of difference stationarity, which isa point null hypothesis, many researchers, Koop (1992) and Wago and Tsurumi(1991) among others, present the size and power of their Bayesian decisionprocedures (10.5) in terms of the relative frequency in hypothetical repetitionsof data generation.

It is now clear that the second and the fourth versions of the testing inSection 10.2 involve both the first and the second problems due to point nullhypotheses, whereas the third version has the second problem. The third versionis free from the first problem because it deals with a bounded domain ofparameters.

Concluding my explanations in Sections 10.3 and 10.4 I warn that we shouldnot expect the results of the classical and the Bayesian inference to be related ina simple manner. In general, the parameter is a random variable in one, and a

6 It is assumed that f ( 0 \ ) integrates to unity over ®\ and that the integrals in the nominatorand the denominator of (10.15) are both bounded.

7 See Berger and Sellke (1987).

Page 125: Hatanaka econometry

112 Unit-Root Tests in Univariate Analysis

constant in the other. The inference is conditioned on just one set of availabledata in one, and based on a hypothetical sequence of data generations in theother. In particular, concerning the four versions of discrimination mentioned inSection 10.2, the Bayesian analysis of the likelihood function is standard, butthe classical analysis is non-standard as demonstrated in Chapters 4-7. Bayesianprior distributions raise a number of problems. Use of non-informative priorsis very much limited,8 and this makes it difficult to compare the Bayesian andthe classical results on the unit-root problems. However, the third version ofdiscrimination, H^ vs. Hf\ which is the discrimination between DS and TS,is the least affected by the difficulties on prior distribution.

10.5 Results in the Second and the Third Versions

The test in Koop (1992) deals with the second version in spite of the announce-ment in its introduction that the TS will be tested against the DS. The model is

(10.16)

where {e,} is i.i.d. The DS is represented by fi\ + fii + ft = 1, upon whichTIG = 0.5 is placed. The priors are proper and symmetric about this line, using theCauchy distribution and a family of distributions developed in Zellner (1986),which is called the g-priors. Koop (1992) analyses the Nelson-Plosser historicaldata and concludes that the DS is rejected in most of real variables but notrejected in nominal variables though this conclusion should not be taken asunequivocal.

Schotman and van Dijk (199la) performs the third version of testing on thereal exchange rate. The model is

(10.17)

where {e,} is Gaussian i.i.d. Unlike Dejong and Whiteman (199la, b) theyfollow the modelling in the Dickey-Fuller test in Section 5.2 so that n and pare not variation free. Since /u, is not identified asymptotically at p = 1, anynon-informative prior on fj, leads to the posterior density diverging to oo atp = 1 as T -> oo. They think of a prior of fi conditional upon p such thatranges of uniform distributions diverge to oo as p approaches unity from below.The idea may be useful to analyse the model of two regimes, (10.6a) (10.6&),but actually they choose some other prior on //,. They also choose the prior ofp uniform over [A, I], where A is chosen so that most of the likelihoods arecontained in this interval. Setting JTQ = n\ = 0.5 the posterior probabilities ofp = 1 in various real exchange rates range between 0.7 and 0.3.

8 Occasionally a point null hypothesis is examined by the HPDI (highest posterior densityinterval) significance test. It checks if do is contained in a confidence interval based on the posteriordistribution. See Lindley (1965: 58-61). If a continuous prior is used I would follow the criticismof the method by Berger and Delampady (1987) to the effect that it does not assess the distinguishedpoint, #o- The point accepted by the decision rule (10.5) may very well be outside of the confidenceinterval.

Page 126: Hatanaka econometry

Bayesian Discrimination 113

Schotman and van Dijk (I99lb) extend (10.17) to

(10.17')

and apply it to the Nelson and Plosser historical data extended up to 1988. Theresults are that real variables are trend stationary, but nominal variables, prices,interest rates, velocity, and stock price are not. Schotman and van Dijk (1993)extends (1.17') to

and places a prior on c(l) = 1 — c\ — ... - cp. The prior on /u, is normalconditionally upon c(l) and ae, and the conditional variance diverges as c(l)approaches zero (i.e. unit root). They analysed the annual post-war data of realoutputs in sixteen OECD countries with p = 3. Those of eight countries arejudged DS.

Note that only Schotman and van Dijk (1991a, b, 1993) deal directly withthe discrimination between DS and TS in Bayesian terms. It is interesting thatthe Bayesian discrimination gives results relatively more favourable to TS thanthe classical tests for DS against TS do.

I have not seen Bayesian analyses applied to the post-war data except forthe exchange rate and annual data of real outputs in Schotman and van Dijk(1991a, 1993).

Page 127: Hatanaka econometry

This page intentionally left blank

Page 128: Hatanaka econometry

PART II

Co-integration Analysis inEconometrics

Page 129: Hatanaka econometry

This page intentionally left blank

Page 130: Hatanaka econometry

Overview

The co-integration research has made enormous progress since the seminalGranger representation theorem was proven in Engle and Granger (1987). Awide range of splendid ideas has emerged. Indeed, the development has beenso rapid and divergent that the current state seems a little confusing. Part IIpresents the co-integration as I view it as an econometrician. My view has beeninfluenced by my experiences with macroeconomic time-series data obtained inwriting Part I. What I learned was that modes of deterministic trends are oftendifficult to determine, and that, partly because of this, integration orders are noteasy to identify in some variables. The writing of Part II on the co-integration isthus orientated to giving more detailed consideration to deterministic trends, andto raising appreciation of the methods that are robust to the integration ordersand different modes of deterministic trends. My view has also been influencedby results of simulation studies by my colleagues in Japan on the limited lengthof data. It leads me to mild scepticism about usefulness (for macroeconomics) ofthe widely applied analysis of co-integrated VAR (vector autogressive process)in Johansen (199 la), which otherwise excels all other inference methods. Themethods which are developed by Phillips (199la) on the econometric motiva-tion assume rather than estimate some aspects of the co-integration. They mightbe of some use, however, unless the assumption is definitively rejected by theVAR method, which I do not imagine to happen frequently. These are the basicgrounds on which Part II is written.

Chapter 11 compares two modelling strategies because this topic is related tothe judgement just made in regard to the VAR and 'econometric' analyses ofthe co-integration. One of the two strategies is to start with the VAR modellingwith as few economic theories as possible, and has been proposed most vigor-ously in Sims (1980fo). The other is to incorporate fully the economic theoriesinto the models, and has been the tradition of econometrics. The ideal pro-cedure is undoubtedly to start with the minimum of economic theories and testrather than assume the validity of the theories. Its effective implementation isoften hindered by limitations to the length of the time-series data available formacroeconomic studies. Chapter 11 also serves the purpose of introducing anumber of basic concepts on the multiple economic time series such as Grangercausality and weak exogeneity. Chapter 12 explains the co-integration and theGranger representation theorem. It also discusses in some length the relation-ship between the economic theories and the co-integration analysis especiallyabout the meaning of long-run equilibrium. The methods that were called above'econometric' are presented in Chapters 13 and 14. In Chapter 13 are explainedthe methods developed in Phillips (199la) among others on the co-integrated(rather than spurious) regressions. As for the computations they involve nothing

Page 131: Hatanaka econometry

118 Co-Integration Analysis in Econometrics

more than conventional f- and F-statistics, and, as for the distributions, nothingmore than the standard normal and x2 distributions. Nevertheless we are freedfrom worrying about the correlation between the regressors and disturbance ifthe disturbance is stationary. In Chapter 14 these theories are applied to theinference problems on dynamic models such as the Hendry model and a classof general models that includes the linear, quadratic model in Kennan (1979),which is a basis for many empirical studies of rational expectations. The methodin Johansen (1991a) is explained in Chapter 15, acknowledging the criticismby Phillips (I99ld) but also clarifying what the method contributes to. Thischapter also includes a brief summary of the simulation studies mentioned above.My opinion expressed in the first paragraph about the VAR and 'econometric'approaches are explained in Sections 13.4.2, and 13.4.3, and Sections 15.1.4,15.1.5, 15.2.1, and 15.4.1.

A large number of topics are left out in my survey, (i) In Part I it hasbeen discovered that CPI and money stock may well be 1(2), but I shallassume throughout Part II that the integration orders are at most unity.1 Othertopics that are not included in the book are (ii) the multi-co-integration, whichmay well be useful for studying stock-flow relationships,2 (iii) the Bayesianstudies,3 (iv) heteroskedasticity in the co-integration,4 (v) errors in variables,5

(vi) time-varying parameters,6 (vii) the asymmetric adjustment,7 (viii) the non-linearity,8 (ix) panel data,9 (x) qualitative data,10 (xi) ARCH models,11 and(xii) forecasts.12 I have transferred to Appendix 6 the explanation of methodsthat utilize a non-parametric estimation of the long-run covariance matrix on theground explained in Section 7.3.3 of Part I. I have not included the potentiallyuseful test on the restriction on VAR coefficients (such as Granger non-causality)in Toda and Yamamoto (1995). The test does not require any pretesting on the

1 See Johansen (1992a, 1992e) and Juselius (1994) for 1(2) and Davidson (1991) and Engle andYoo (1991) for a general treatment of integration orders. Park and Phillips (1989), Stock and Watson(1993), and Haldrup (1994) extend the analysis of co-integrated regression in Chapter 13 to higherintegration orders. Engsted (1993), Johansen (1992rf), Stock and Watson (1993), Juselius (1994),and Haldrup (1994) present empirical studies dealing with 7(2) variables. See also the literature onthe multi-Co-integration given in n. 2.

2 The multi-Co-integration is denned and analysed in Granger and Lee (1989, 1990), and alsoanalysed in Johansen (1992a) and Park (1992). See Granger and Lee (1989) and Lee (1992) for theapplications.

3 DeJong (1992), Phillips (\992a), Kleibergen and van Dijk (1994), Chao and Phillips (1994),and Tsurumi and Wago (1994).

4 Hansen (1992*).5 Nowak (1991).6 Canarella, Pollard, and Lai (1990), and Granger and Lee (1991).1 Granger and Lee (1989).8 Granger (1991, 1993) and Granger and Hallman (1991).' Levin and Lin (1992) and Quah (1993) present a univariate unit root test on panel data. I do

not know of any literature on the co-integration on panel data.10 McAleer, McKenzie, and Pesaran (1994).11 Bollerslev and Engle (1993).12 Engle, Granger, and Hallman (1989).

Page 132: Hatanaka econometry

Overview 119

integration order or the co-integration rank, but it does not fit in the organizationof Part II.13

A number of research topics clearly suggest themselves in the course of mysurvey, but there is no time to pursue them.

The expression of the characteristic polynomial used in mathematicaleconomics is abandoned in Part II, and instead the convention in time-series analysis is adopted. See Section 2.3 (Part I) for the relation betweencharacteristic polynomials in the time-series analysis and the mathematicaleconomics.

The readers who are interested in the regression analysis rather than themultiple time series may start from Chapter 13. Major prerequisites for theunderstanding of Chapter 13 are Chapters 4-7 rather than Chapters 11 and 12.Readers will learn that the new econometrics differs from the traditional oneeven at the level of a simple regression model. However, an important portionof Chapter 13, that is Sections 13.4.2 and 13.4.3, cannot be understood withoutChapters 11 and 12.

13 Readers are referred to Hargreaves (1993) for a nice survey in a summary style, coveringmore methods than Part II does.

Page 133: Hatanaka econometry

11Different Modelling Strategies on

Multiple Relationships

11.1 Economic Models and Statistical Models

11.1.1 Simultaneous equations model

The present book is primarily concerned with econometrics for macroeconomics,and the economic models here describe economic theories on the interrelationsamong macroeconomic variables. The variables are aggregate measures of quan-tities of transactions and of prices concerning goods and assets. The interrelationsamong these measures originate in the arbitraging and the optimizing behaviourof economic agents. The time units relevant to empirical investigations of theseinterrelations are those in which the data of the aggregate measures are avail-able, usually a quarter. The interrelations suggest certain regularities upon theway in which past measures affect current measures. The interrelations may notcompletely determine the current measures, given the past, and the parts thatare left indeterminate are accounted for by random variables with zero mean. Infact the shocks that play a fundamental role in many macroeconomic theoriesare the parts of current variables that are not determined by their past measures.Since one quarter is much longer than the time unit in which an economicagent decides on its transactions, the interrelations also suggest regularities onthe way in which current measures of different variables are determined jointlywith mutual interactions within the same quarter.

Functional forms of interrelations are not specified by economic theories.Econometrics has adopted, in fact too easily, a linear approximation to theinterrelations among macroeconomic variables. We thus arrive at what is calledthe linear simultaneous equations model in econometrics.

(11.1)

where {x,} is a fc-element vector stochastic process of macroeconomic variables,and {«r} is a vector stochastic process with zero mean, {jc,} is observable, but{u,} is not. In principle, all of the k x k coefficient matrices, A0, AI , . . . areto reflect economic theories on the interrelations among elements of {jc,}. A0

is assumed to be non-singular. For simplicity we may assume that {u,} is i.i.d.with £(«() = 0 and E(u,u't) = £2. Equating each of k elements of both sides of(11.1) yields k equations each called a structural equation emphasizing that itreflects economic theories on the agents and markets.

Page 134: Hatanaka econometry

Different Modelling Strategies 121

11.1.2 Identification

The identification of a (vector) parameter is denned as follows. Given a model,the entire space of parameter values is divided into observationally equivalentclasses. Two points belong to the same class if and only if they produce theidentical probability distribution of observations. If each class contains no morethan a single point, the parameter is identified. In many models the parametersare identified from the outset. In some models the parameters are identifiedafter some normalization rules are introduced. In (11.1) the normalization setsthe diagonal elements of AQ to unities, which is innocuous for the purpose ofthe analysis. In Chapters 13 and 15 we shall find some normalization rule that isnot innocuous. In (11.1) the remaining part of parameters, i.e. the off-diagonalelements of AQ and A\,..., and distinct elements of £2, are still left unidentified.In fact a model obtained by left multiplication of both sides of (11.1) by anon-singular matrix produces the same probability distribution of the stochasticprocess {x,} as the original model.

In general, any constraint upon parameters eliminates from each (observa-tionally equivalent) class those points that violate the constraint. If a singlepoint or none is left in each class, the parameter is identified by the constraint.When every class has one and only point, the parameter is said to be just-identified. When some classes are void and the other classes have only onepoint, the parameter is said to be overidentified. In regard to (11.1) propo-nents of the simultaneous equations model assumed that the economic theoriesprovided constraints upon AQ, A\,... and £2 that are sufficiently strong to iden-tify AQ, A I , ... and £2 completely.

In what follows, when a (vector) parameter can be identified only with anormalization rule or a constraint provided by economic theories, I shall statethe condition explicitly. When the parameter is identified (unidentified) withouta normalization rule or a constraint, I shall simply write identified (unidentified).

11.1.3 VAR model

Sims (1980ft) argued that applications of the simultaneous equations modelto macroeconomics had in fact relied upon artificial constraints unjustified byeconomic theories in order to achieve the complete identification. In my judge-ment the criticism of Sims (1980ft) is well acknowledged by econometricians.

As an alternative to the simultaneous equations model Sims (1980ft) proposedthe VAR model,

where {vt} is i.i.d. with zero mean. (11.1) and (11.2) are alike and in fact (11.2)follows from (11.1) by multiplying AQ[ on both sides. But an important differ-ence between (11.1) and (11.2) is that the coefficient matrices, B\,Bi, ... in(11.2) are identified without any constraints derived from economic theories.

(1.12)

Page 135: Hatanaka econometry

122 Co-Integration Analysis in Econometrics

Moreover, the VAR model is a statistical model because B\, 82,. • • are deter-mined solely on the basis of how well (11.2) fits the time-series data of {*,}.

As pointed out above, a VAR model is derived from the simultaneousequations model (11.1) by multiplying AQ' on both sides. Constraints onAO, AI , . . . due to economic theories may be transferred to Bi,B2,.... Thussome aspects of economic theories may place constraints upon the parametersof VAR models. Since B\,Bi,... are identified without the aid of economictheories, the constraints provide overidentifying restrictions, which can be testedby observations. Moreover, in many such studies one does not have to deal withstructural equations explicitly, thus motivating empirical studies of economictheories on the basis of the VAR rather than the simultaneous equation. Thereis a large amount of literature on this topic. I mention here tests of rational-expectations hypotheses on a co-integrated VAR in Shiller (1981&), Campbelland Shiller (1987, 1988), and Kunitomo and Yamamoto (1990).

11.1.4 Comparison

I shall now compare the simultaneous equations model (11.1) and VAR (11.2).In applications of (11.2) the lag orders are truncated at p so that the coefficientmatrices are B\,..., Bp.

1 Altogether they involve pk2 parameters when AC, hask elements. Since the sample size available for macroeconomic studies is 100 to150, the VAR includes too many parameters in comparison with the sample size.On the other hand, the simultaneous equations model has many fewer (effective)parameters because of the constraints placed on the matrices AO, A\,... .It mustbe admitted that economic theories do not constrain either lag orders or dynamiclag structures represented by A\, A 2 , . . . in (11.1), but in practice a large partof elements of AI , AI, ... is suppressed to zero in the simultaneous equationsmodel. The validity of such practices may be doubtful, but they contribute toreducing the number of unknown parameters to be estimated.

As stated above, constraints are placed upon A0, AI , . . . in the simultaneousequations model (11.1), while they need not be on B\, B2,... in the VAR model(11.2). This may lead us to think that the simultaneous equations models arelogical specializations of the VAR models. However, if our consideration isconfined to those models that can reasonbly be estimated with 100 or 150observations of quarterly data, the simultaneous equations model can containa larger number of variables (denoted by k above) than the VAR models do.This is because the effective number of parameters in AO, AI , . . . is much smallerthan pk2, even if it is assumed that the same lag order p is adopted between thesimultaneous equations and VAR. The 'feasible' VAR models are not necessarilylogical generalization of 'feasible' simultaneous equations models.

If feasible simultaneous equations models are logical specializations of afeasible VAR model, then the VAR model may be used as a benchmark to test the

1 The determination of lag orders is an important problem. See Lutkepohl (1991: 118-50) forstationary multivariate models, and Morimune and Mantani (1993) for non-stationary models.

Page 136: Hatanaka econometry

Different Modelling Strategies 123

simultaneous equation models. In the VAR model B\,..., Bp are just-identified.On the other hand, some of the constraints on A 0 , A i , . . . allegedly derivedfrom economic theories may imply constraints upon AQ 1A\ =B\,.. .A^1AP =Bp, leading to overidentifying restrictions upon the VAR coefficient matrices,B\,... ,BP. This approach has been taken in Hendry and Mizon (1993). Thebasic logic here is identical to that mentioned in the last paragraph of 11.1.3above.

71.1.5 Intermediate positions

Admitting that economic theories do not identify completely AQ,AI , ..., and£2, the theories do divide the entire space of parameter values into a numberof observationally equivalent classes. Even though each class may comprise anumber of parameter points, some aspects of economic theories can be testedempirically if each class is such that either all members of the class admit theaspects or no members of the class admit the aspects; in short, if the aspects areproperties of the observationally equivalent classes.2 This approach to save thesimultaneous equations model from Sims's criticism has not long been exploredin practice to the best of my knowledge, but it has been found in the co-integration analysis that some aspects of economic theories are properties of theobservationally equivalent classes. See Section 15.2.1 below.

Recently a modelling strategy called the structural VAR has been proposedin Bernanke (1986), Blanchard and Quah (1989), Blanchard (1989), King et al.(1991). In my judgement the structural VAR is a variant of the simultaneousequations model rather than a VAR. It does not assume A0 = I in (11.1), andmoreover employs some constraints on the parameters AO, AI , ..., and £2 justsufficient to identify them completely. Common to all the structural VARs is thespecification that Q = E(u,u't) is diagonal in (11.1) while assuming that [ut]is i.i.d. In other words, shocks in different structural equations, for example,demand shocks and supply shocks, are uncorrelated at all leads and lags. It hasbeen known in the simultaneous equations literature that this is not sufficientfor complete identification of AO, AI , . . . , and indeed additional constraints areintroduced.

What deserves our particular attention is Blanchard and Quah (1989) and Kinget al. (1991). In Blanchard and Quah (1989) and also Gali (1992) constraintsare placed upon the impact matrix of shocks in different structural equations,ut, upon long-run components of variables xt. Conceptually this matrix is

in (11.1), but it is easier to work along the methods given in Chapter 12. Kinget al. (1991) derive particular co-integration properties (rank and structure of

2 Phillips (1989) expresses a similar suggestion by some reparametrization.

Page 137: Hatanaka econometry

124 Co-Integration Analysis in Econometrics

the co-integration space) from an economic theory. This implication upon thelong-run impact of shocks is then used for the identification of parameters in thesimultaneous equations. This will be further explained in Section 15.5. Modelsin Lee and Hamada (1991) and Ahmed et al. (1993) are simultaneous equationsmodels on the relationships among long-run components of different variables.

In contrast to the standard VAR the structural VAR model achieves identifi-cation of its parameter through economic theories,3 and thus may be subjectedto criticisms such as Sims (1980&) about the validity of economic theories.However, the constraints due to long-run economic theories are less controver-sial than those due to the short-run dynamics. Moreover, the standard VAR isnot without a drawback in other respects, as mentioned earlier. One way to testthe economic theories is to just-identify the model by constraints derived froma portion of the theories and then test the validity of the rest against the obser-vations. In my judgement there is no unequivocal choice between the standardVAR and the simultaneous equations model.4

It is the (standard rather than structural) VAR models that are adopted inmost parts of Part II for the explanation of co-integration analysis. Economictheories are considered in implementing the co-integration analysis, but struc-tural equations are not. This strategy is found adequate because many long-runeconomic relations do not directly deal with structural equations (for instance,demand function or supply function), but with some equations derived fromthem through the equilibrium conditions. However, we shall need some consid-eration analogous to the simultaneous equations in Section 15.2.1, where wediscuss the unidentified co-integration matrix to be denoted by B'. The simul-taneous equations models are briefly touched on in Section 13.1.4, but by andlarge unexplored in the present book. The long-run version of structural VARmodels may well be an important topic for future research.

11.2 Weak Exogeneity

Most of the models in econometrics are conditional upon exogenous variables,and the exogenous variables are not explicitly modelled. The concept of weakexogeneity provides conditions under which we can perform efficient inferenceon the conditional model without employing the entire model. The purpose ofthe present section is to give a brief explanation of weak exogeneity to theextent that is necessary for the reading of Part II. Readers are referred to Engle,Hendry, and Richard (1983) for more about weak exogeneity and to Ericsson(1992) for a good expository presentation.

3 Sims (19806) achieves identification of AO, A.\, . . . in (11.1) by a particular causality orderingamong different shocks, which ordering however is not founded on economic theories. This inter-pretation of Sims (1980fc) has been provided in Bernanke (1986).

4 I am indebted to suggestions from Yosuke Takeda on the writing of the present and previousparagraphs.

Page 138: Hatanaka econometry

Different Modelling Strategies 125

For illustration consider a bivariate normal distribution of x' = (x\,X2) withmean vector p' — (^\, ^2) and covariance matrix

The parameters are (/u-i, 1^2, &\\, ffu, 022)- The p.d.f. of x is

where det[E] is the determinant of E. This can be expressed asf ( x 2 \ x \ ) f ( x i ) ,where f(x2\x\) is the conditional p.d.f. of ^2 upon x\, i.e.

(11.3)

(11.4)

Moreover, f ( x \ ) is the marginal p.d.f. of x\, i.e.

The conditional p.d.f. contains three parameters, ft, 0-22.1 and ^2.1-, and themarginal p.d.f. contains two parameters, o\\ and IJL\. Suppose that x't — (x\t, X2t)is distributed as x given above and {xt} is i.i.d. Then over the sample periodt = 1 , . . . , T the log-likelihood is

The first term on the right-hand side of (11.5) contains parameters,(ft, 022.1, M2 . i )> and the second term has ( C T H , / Z I ) . The variations of(ft, 022.1, ^2.1) are free from (a\\, /u-i) so that the differentiation of (11.5) withrespect to (ft, 022.1, ^2.1) is in no way constrained by (CTH, /iii).

Suppose that we are concerned with the estimation of ft only. Since ft is notcontained in the second term of the right-hand side of (11.5), the maximum-likelihood estimation of ft is performed by maximizing the first term only, whichcomes from the conditional p.d.f., (11.3). The equation that is associated withm.T> is

(11.6)

The marginal p.d.f. of x\, (11.4), is irrelevant to the maximum-likelihood esti-mation of ft. Then it is said that x\ is weakly exogenous in (11.6) with respectto ft. The ft in this situation is called the parameter of interest. The equation(11.6) is called the conditional model, and

(11.7)

is called the marginal model.

(11.5)

Page 139: Hatanaka econometry

126 Co-Integration Analysis in Econometrics

It is always important to specify the parameter of interest when one mentionsweak exogeneity. In fact in the above example x\ is weakly exogenous in (11.6)with respect to (ft, /u,2.i, 022.1) instead of ft alone.

The formal definition of weak exogeneity is given in Engle, Hendry, andRichard (1983) as follows. Let {xt} be a vector stochastic process, and that thejoint p.d.f. of (XT, XT~\, • • • , x\, XQ) is

(11.8)

This expression as a sequence of products of the conditional p.d.f.s is alwayspossible, and in fact frequently used in time-series analysis. Throughout t =T, ..., 1, 0 xt is partitioned identically as x't = (x'lt, *2r). Then (11.8) is written

(11.9)

Moreover, assume that f(X2t\x\t, xt-\,..., x},x0), t = T,..., 1, 0, contains thevector parameter AI ; that f(x\t xt-\,..., X\,XQ), t = T,..., 1, 0, contains thevector parameter A2; and that AI and A2 are variation-free, i.e. the variations ofAI are in no way constrained either through equalities or inequalities by A2 andvice versa.5 (11.9) is expressed as

If AI is the parameter of interest, x\, is weakly exogenous in theconditional model representing f(x2t\x\t, x , _ i , . . . , x\, XQ; AI ) , t = T,..., 1,0.The maximum-likelihood estimation of AI can be based upon the conditionalmodel alone. The marginal model is irrelevant to A I .

When A' = (A'j, A2) is reparametrized into y/ as is often done in econometrics,x\, remains weakly exogenous with respect to any elements of y1 that can beexpressed by A j alone.

Nowhere in the above explanation is assumed the stationarity of {xt}. Infact Banerjee et al. (1993: 245-52) demonstrate the importance of the weakexogeneity in the models in which 7(1) variables are involved.

Let us consider the weak exogeneity in the simultaneous equations model(11.1) with lag order p,

where {«,} is Gaussian i.i.d. with E(ut) — 0 and E(u,u't) = Q assumed to bepositive definite. [xt] may be non-stationary with unit roots. We partition x't as(x(t, JC2r) with k\ and fc2 elements in x\t and *2, respectively.

I shall proceed as though k\ = ki = 1, but its extension to any k\ and &2 isstraightforward. Let us write

5 Ericsson (1992) presents a nice explanation of variation-free parameters on the stable cobwebmodel.

Page 140: Hatanaka econometry

The variance of X2t conditional upon x\t, xt-\, x,_2, • • •, *o ls

Since {x,} is Gaussian, the conditional mean and conditional variance determinethe conditional p.d.f. f(x2t\x\,, x , _ i , . . . , x0).

Different Modelling Strategies 127

(11.10)

(ll.lla)

(11.116)

Then the model is

I shall use notations

Then

Consider then

Using (ll.lla), and noting that u, is uncorrelated with jc,_i, xt-2, . . . , *o> thisis equal to

(11.12)

(11.13!

but

Therefore

(11.4)

Page 141: Hatanaka econometry

128 Co-Integration Analysis in Econometrics

So far no particular assumptions have been introduced, but let us now intro-duce two assumptions.

Assumption 1 a>u — 0

Assumption 2 a\2 = 0.

Then Cl = (off"1, 0), c2 = (-a^~la^af^\ af^1). (11.12) is reduced to just(wir.O)' . (11.13) is now,

(11.13')

(11.14')

and (11.14) is

The mean and variance of {x^} conditional upon (x\t, xt^\, x,-2< • • •). (11.13')and (11.14'), contain a2\i = 0, !,...,/> and a>22 only. On the other handthe mean and variance of {x\t} conditional upon (x,-i,xt-2, • • • ) underAssumptions 1 and 2 contain <ru , a\ , i — 1,..., p and &>n, but do not contain«2 \ i = 0, 1 , . . . , p and a)22- The only additional assumption needed for theweak exogeneity is

Assumption 3 There exist no a priori constraints that relate a } , ; ' = 0,1 , . . . , p and a>\\ to a2 , i = 0, 1 , . . . , p and a>22-

Under Assumptions 1, 2, and 3 x\t is weakly exogenous in the conditional model,

with respect to a2\ i = 0, 1 , . . . , p and a>22- The result may be regarded asa special case of Theorem 4.3 of Engle, Hendry, and Richard (1983), whichhowever deals with a rather complicated model.

Assumptions 1, 2, and 3 are only sufficient conditions, but I do not know ofany weaker sufficient conditions.6 In particular, x\t being predetermined is notsufficient for an efficient estimation of a2 and u>n, and therefore not sufficientfor the weak exogeneity. Even though the predetermined x\t yields a consistentestimation of a^ and a)22, these parameters are contained in the marginal modelas well. When Assumptions 1 and 2 hold, the simultaneous equations model iscalled recursive.

6 For the stationary simultaneous equations model Hatanaka and Odaki (1983) shows that x\,being predetermined in the second equation, i.e. E(x\,uit) = 0, is equivalent to w\2 = «]2 "22 ~ c°22'which in turn is equivalent to the conditions, (11.13') and (11.14'), of the conditional model.However, this is not sufficient for the weak exogeneity, because the marginal model still contains a^and u>22 unless either Assumption 1 or Assumption 2 is introduced. Needless to say that Assumption3 is always needed for the weak exogeneity.

Page 142: Hatanaka econometry

The original parameters, a\' , a2 , i = I,..., p, co\\, «12, 0)22 may be repara-metrized into a^, a = a - w2\ct>^a^, i = I,..., p, co\\, /J2.i = ft^itw2

l)2}

0)22.1 = ^22 — o)2\o)^o)i2. Then x\t is weakly exogenous in (11.15) with respecttO &2\ i = 1, ..., p, /82.1, 0)22.1-

11.3 Granger Non-Causality

Granger (1969) considered the causality of y on x in regard to two optimal(minimum mean squared error) predictions of xt, (i) using the information set ofxt-i, xt-2,... and (ii) using the information set of (xt-\, yt~\), (xt-2, yt-2), • • • •The second prediction never entails a mean squared error larger than the firstprediction, but it is possible that two predictions have identical mean squarederrors. When that occurs y fails to cause x as the information of (y,-i, yt~2, • • •)does not contribute to the prediction of x,. This definition of (non-) causalitymay well be a misleading wording as the causality should refer to some intrinsicproperty of a system rather than a prediction. The concept, 'Granger- (non-)causality', was adopted in Sims (1972) to allow for this point, and has sincebeen accepted by economic theorists and econometricians. It must be pointedout that prediction does reflect whatever one identifies as the intrinsic property.

Granger non-causality has since been found indispensable in economic theo-ries to discern different channels in which expectation is formed by economicagents and markets. The concept has also played an important role in manyempirical studies dealing with expectation. We shall need the concept of Grangernon-causality to explain the framework of cointegration in Chapter 12 and theinference theory in Chapter 13. The present Section provides a brief explanation

Different Modelling Strategies 129

Neither Assumption 1 nor Assumption 2 can be verified by the data alone.Some a priori knowledge is needed to justify them because the parametersinvolved are not identified without such knowledge.

The VAR model (11.2) in the case of k - 2 is obtained by making A(0) = 72

in (11.10). Then Ci = (1,0) and c2 - (0, 1). (11.12) becomes E(u,\ciu,) =(1, o)2]_o)^i)'uit, and (11.13) becomes

The conditional variance is 0)22 ~~ (^>2\o)^o)\2- The conditional model may bewritten as

(11.15)

(11.16)

and the marginal model for x\t is

Page 143: Hatanaka econometry

130 Co-Integration Analysis in Econometrics

only to the extent that is necessary for the reading of Part II. Readers are referredto Granger (1969), Sims (1972), Geweke (1984), Granger and Newbold (1986,chs. 7 and 8), and Granger (1988ft) for more about Granger non-causality, andStock and Watson (1989) and Friedman and Kuttner (1993) for recent empiricalstudies in which Granger causality is examined.

If {(x,, yt)} starts at t = 0, the optimal prediction of xt from (xt-\,..., x0) isE(x,\x,^i, ...,x0), and the optimal prediction of xt from (xt^\, y,-i,... ,x0, y0)is E(x,[x,-i, yt-\,... ,XQ, yo). The variances of errors in the two predictions areidentical if and only if

holds for all realizations of (x,~\, yt-\,... ,XQ, yo). Thus y failing to cause x inGranger's sense is defined by (11.17) for all t > 1. If {(xt, y,)} starts at t = —oo,the conditioning variables in (11.17) must be replaced by (x,- \,..., XQ , x-1,...)on the left-hand side and by (xt-\, y , _ i , . . . , XQ, yo, x-\, y _ i , . . . ) on the right-hand side, and some specifications on {(x,, y,)} would be required to performour reasoning on the conditional expectations.

Snnnnsp that

As stated in (11.17) X2t does not cause x\t in Granger's sense if and only if(11.19a) and (11.19ft) are identical with probability 1. The difference betweenthe two is

(11.18)

is a bivariate, possibly non-stationary VAR. The reasoning made on (11.10) canbe carried over here by setting A(0) in (11.10) to li and equating A(i) to BM in(11.17). We write

E(vtv't) = E is positive definite, but {v,} need not be Gaussian. It will be shownthat x2, does not cause x\, in Granger's sense if and only if b^ =0,i=l,...,p.Proofs for the case where {x,} is stationary and also for the case where {xt} isnon-stationary and not cointegrated will be shown below. The case where {x,}is co-integrated will be discussed in Section 12.2.4.

Suppose that (11.18) is initiated by some random variables (*p_i , . . . , *o)-From the first equation of (11.18)

(11.19a)

(11.19ft)

(11.20)

(11.7)

Page 144: Hatanaka econometry

Different Modelling Strategies 131

If (11.18) is a stationary VAR with stationary moments in the initials( X P - I , . . . , xo),7 it is easy to prove that (11.20) vanishes with probability 1 ifand only if &$ = 0, i = 1 , . . . , p. First of all, letting ̂ = (x^, x't_2, ...,x^>,it is known that £(f ,_i£J_j) is positive definite. Since i/r_j = fei-i* • • • , *2f-p)and f f_! = (xit-i,x\,-2, . . . , X I Q ) are both subvectors of f ,_i , the covariancematrix of ij(_i conditional upon £,_i is positive definite. Therefore (xzt-t —E(x2t-t xi,-i,xit-2, . . . , *n)), J = 1, . . . , />, is a p-dimensional, non-degeneraterandom vector. (11.20) vanishes if and only if b^2 — 0, i — 1,..., p. Thusb^2 = 0, i = 1 , . . . , p, is necessary and sufficient for X2, failing to cause x\, inGranger's sense. This result can be obtained also when (11.18) starts at t — -oo.

The above result is implicit in Granger (1969), and has been the mostwidely used version of Granger non-causality. Since B (1\ . . . , B ̂ are identifiedGranger non-causality can be tested with the observation of {x,}.

Continuing with the stationary case of (11.18), it can be inverted into a VMA(vector moving-average process),

(11.21)

If b(H = 0, z = 1 , . . . , p in (11.18), then c(£ = 0, i = 0, 1,. . . , in (11.21).Let us consider then a VMA (11.21), not necessarily one that is obtained

through inversion of a VAR. {e,} is i.i.d. with E(st) — 0. We shall need laterthe following theorem.

Theorem 11.1 (Sims (1972)). Suppose that (11.21) starting at t = -oois a stationary, linear, indeterministic process with E(e,e't) = /2.8 Then x2

does not cause x\ in Granger's sense if and only if we can choose eithercff = 0, i = 0, 1, 2 . . . or c(i2 = 0, i = 0, 1, 2, . . . .

The proof in Sims (1972) uses some advanced mathematics. Here I try toshow plausibility of the theorem. First of all, the italicized words, we canchoose, need explanation. In general two different stochastic processes may

7 A word of explanation may be needed on stationary moments of initials. For illustration I usea stationary AR(1),

If the process starts from ( = -oo and (*] , . . . , XT) are observed,

Suppose that the process starts from t = 0 but

Then we still have the moments on x, given above. We say that XQ has the stationary mean (l-a^r'/iand the stationary variance o-2(l - a2)*1.

8 A stationary process {x,} is said to be indeterministic if there is no part of x, that can beaccurately predicted from (xt~\, xt-2, • • •)•

Page 145: Hatanaka econometry

132 Co-Integration Analysis in Econometrics

be regarded identical if they share the same first- and second-order moments.(The second-order moments are the sequence of autocovariance matrices.) Thesame VMA process can then be represented in different ways even when werestrict E(ete't) to I-L. Suppose that a bivariate process with E(xt) = 0 and agiven autocovariance matrix sequence is derived for the process (11.21). Thena process with the same first- and second-order moments as (11.21) can bedescribed by

Therefore X2 does not cause x\ in Granger's sense if and only if there exists anon-zero scalar d such that c,(1) = rfeft. (Of course, eft = c,(0) = 0 is precludedfrom our consideration.) The condition c\ — dc\ is equivalent to existence ofa vector/such that \\f\\ = 1 and c,(1)/ = eft/ = 0. Form an orthogonal matrix,either \f,g] or [g,f], with g such that g'f — 0 and ||g|| = 1, and adopt it forH above. Then either eft = eft = 0 or eft = eft = O.9

QED

In Chapter 131 shall need another theorem on Granger non causality.Theorem 112 (Sims (1972)). Suppose that (11.21) is a stationary, linear,

indeterministic process with E(e,e't) = S, which is positive definite. Then X2does not cause x\ in Granger's sense if and only if the distributed lag relation

if Cw = C(l)H,u, = H'e,, and H is an orthogonal matrix. (In short, theCs in VMA (11.21) are not identified when differences among probabilitydistributions are judged in terms of only the first- and second-order moments,which is justified if the process is Gaussian.) In the statement of Theorem 11.1the conditions, eft = 0, i = 0, 1, 2, . . . , and/or eft = 0, / = 0, 1, 2 , . . . meanproperties of one of many representations that all denote the process with thesame first- and second-order moments.

Second, let us illustrate Theorem 11.1 with the first-order VMA,

Denoting (eft, eft) by c{'\ it is seen that

holds with E(wt) = 0 and E(\vtx\t-.s) = 0 for all positive, zero, negative valuesof s.

9 A nice exercise is to extend the above reasoning to VMA in lag order 2. It seems that Grangernon-causality implies a complicated condition in the vector ARMA.

(11.22)

Page 146: Hatanaka econometry

This is in the form of (11.22) with wt — b22(L) 1£2t- The zero correlationbetween x\, and ws for all t and s follows from the zero correlation betweenx\, and £2.s for all t and s. It is important to remember that in general {w,} in(11.22) isnoti.i.d.

Conversely suppose that the distributed lag relation (11.21) holds with thecondition E(wt) = 0 and E(w,x,-s) = 0 for all s. In general the conditionalexpectation in the linear process is given by the projection. Here (11.22) impliesthat the space spanned by (x\,-\, ^-1,^^-2,^-2, • • •) is also spanned by(xit-i, w,-], jc i r_i , w,-2, .. •)• Moreover,x\, is uncorrelated with wt~\, wt-2, • • • •

Different Modelling Strategies 133

The original proof of Sims (1972) relies on some advanced mathematics. Ishall follow Yamamoto (1988: 170-1) to demonstrate a proof of 'only if, anda proof of 'if will also be given later.

Let us start with the VAR, (11.18), assuming its stationarity and fc,2 = 0, / =I,..., p. It is known that such VAR models generate linear indeterministicprocesses. Appendix 5 gives a relevant determinantal equation for the stationaryVAR, and when x-i does not cause x\ in Granger's sense it is simplified to

(11.23)

where

(11.23) is equivalent to

(11.23')

and because of the stationarity the roots of fen (A.) = 0 and £22 W = 0 must allbe larger than unity in moduli. In particular both b\\(L) and £>22(£) afe invertible.

Recalling E(vtv't) = £ in (11.18), premultiply (11.18) by

and let e, = Svt. Since {v,} is i.i.d., so is {«,}, and E(ets't) = diag[ern, 022 —cr^'ofj]. Therefore e\t and £2* are uncorrelated for all t and s. Since x\t =bi\(L)^s\,,x\t and £2, are also uncorrelated for all t and s.

After premultiplication by S the second equation of (11.18) is

from which we obtain

Page 147: Hatanaka econometry

134 Co-Integration Analysis in Econometrics

Therefore

i.e., X2 does not cause x\ in Granger's sense. QEDLet us turn to Granger non-causality in the non-stationary case. Suppose that

in (11.18)

(11.24)

(11.24')

(11.25)

so that

Let bfh and bfh, j, h, = 1,2, be (j, h) element of B(0 and B(0 respectively. Itwill be explained later in Chapter 12 that (11.24) means that x\t and xit are notco-integrated. (11.24) enables us to rewrite (11.18) as

I assume that XQ = 0, and that {Axr} in (11.25) is stationary withstationary moments in initials. The information set may be represented eitherby (xt-i,. . . , x i ) or by (A*,_i , . . . , A*i). Since xit-i = £[l| Axi r_,-,*i,_i isin the information set (Ax\,-\,..., AJCH). Therefore

The Granger non-causality of x-it upon x\t holds if and only if

(11.26)

Reintroducing the reasoning about (11.20) and redefining ^t_l, as (Ajc(_i, ...,

A*i), it is seen that (11.26) holds in (11.25) if and only if bQ = 0, z1 =1 , . , . , p — 1, which is equivalent to bQ = 0, j = 1 , . . . , / ? , by virtue of (11.24').

Thus it is seen that Granger non-causality is the triangularity of B(1),..., B(p)in the VAR (11.18).

Page 148: Hatanaka econometry

12

Conceptual Framework of theCo-integration and its Relation to

Economic Theories

The plan of the present chapter is as follows. In Section 12.1 we begin with thedefinition of co-integration in the MA representation as it is the easiest to under-stand. It will be followed in Section 12.2 by the seminal Granger representationtheorem in Engle and Granger (1987). It transforms the definition into thosein the VAR and the error-correction representation, and proves the equivalenceamong them. However, my explanation of the Granger representation theoremfollows that in Engle and Yoo (1991) rather than the original. Since Engle andYoo (1991) use mathematics of the polynomial matrix, an elementary expositionof the mathematics is provided in Appendix 5. The concept of common trendswill be explained in both Sections 12.1 and 12.2. The theoretical structure ofthe Granger representation theorem is illustrated with economic interpretationby a bivariate process in Section 12.3. The error-correction form in the Grangerrepresentation is a statistical rather than an economic model. In Section 12.3,using the economic-error correction model as an example, I shall delineate thetype of the long-run relationships that can be dealt with by the co-integrationanalysis. Also shown is how one can recover the parameter of the economicerror-correction model from that of the statistical error-correction model. High-lights of Chapter 12 are given at the end to help readers to keep important resultsin their memory.

In the original definition in Granger (1981) an 7(1) vector time series has beensaid to be co-integrated when each element of the series is 7(1) individually butsome linear combinations of the elements are 7(0). Many scholars find it moreconvenient for mathematical understanding to delete the above environmentalcondition that each element is 7(1) individually, thus allowing that some elementsof the vector time series may be 7(0). This flexible definition will be adoptedhere. It still holds true that the co-integration relation must involve at leasttwo 7(1) variables (with non-zero coefficients) if it involves any. The flexibledefinition also means that a co-integrating vector may have nothing to do withthe long-run relationship, indicating only stationarity of some variables in thesystem. How to exclude such a co-integration from our consideration will beshown in Section 12.3.7.

The rank of a matrix A will be denoted by p(A), and the determinant of asquare matrix A by det [A].

Page 149: Hatanaka econometry

136 Co-Integration Analysis in Econometrics

Identification problems will be brought up frequently, and as stated inSection 11.1.2 I shall write simply a (matrix) parameter unidentified (oridentified) when the parameter is unidentified (or identified) without aid ofnormalization rules or constraints due to economic theories. The unidentifiedparameter could well be identified if such aid is available.

Throughout the present chapter I shall assume that {Ax,} is stationary, i.e.the integration order is at most unity. This is an important assumption.

12.1 Co-integration in the MA Representation

The present section is a straightforward extension of Section 2.2 (Part I) toa multivariate model. Suppose that {x,} is a vector stochastic process with kelements in the vector. We shall be concerned with the time series of Ax, =x, — je,_i, which is assumed to be generated by a stationary vector MA with apossibly infinite order,

(12.1)

where {e,} is i.i.d. with E(et) = 0 and E(s,e't) = E£ for all /. SE is positivedefinite. Each of the Cs is k x k. E£ can be made Ik by replacing e, by £~1/2e, andCj by C,!!1/2. However even with S£ = 1^ the Cs are not uniquely determinedby the second-order moments of {Ax,} as pointed out in Section 11.3.

Denoting a norm of a matrix, say A, by \\A\\ (e.g., square root of the largesteigenvalue of A'A), the sequence of \\Cj\\ must converge to zero as j —> oosufficiently rapidly to ensure the mathematical validity of each statement thatwill be presented subsequently. But we shall not bother specifying the requiredspeed of convergence because the MA representation of the stationary vectorARMA process always meets the required conditions.

12.1.1 Co-integration space

Just as in Section 2.2 the optimal prediction of x in the infinitely remote futureon the basis of (xt, x,-\,...) is

(12.2)

(12.3;from which it is seen that

The long-run component of Ax, is (X^o°cj) et- Tne identity, (2.7), in Part I isextended to the multivariate case as

(12.4)

where

Page 150: Hatanaka econometry

Co-Integration and Economic Theories 137

From (12.1) and (12.4) it follows that

(12.5)

All of the following analysis will be made conditional upon XQ — C*(L)eo, whichis assumed to be a well-defined random variable. Since C*(L)e, is stationary thestochastic trend in x, can originate only in C(l)^,'i es. Apart from the initialeffects, the x, is decomposed into the long-run component C(l)^', es and theshort-run component C*(L)et. As in Section 2.2 we hereafter redefine x, as

(12.2')

The co-integration as defined in Engle and Granger (1987) is concerned withthe row null space of C(l). Let b be a k x 1 non-stochastic, non-zero vector. IfC(l) has full rank, i.e. p(C(l)) = k, then b'C(l) ^ 0' for any b, so that b'x,is non-stationary having a stochastic trend. If p(C (1)) = k — 1, there is b suchthat b'C(l) — 0'. In fact such b forms a one-dimensional vector subspace. Forany b in the subspace b'xt is stationary and b'x, — 0, but for any b not in thesubspace b'x, is non-stationary and b'x, ^ 0. If p(C(\)) = k — r(r > 0), thereis an r-dimensional vector subspace of b such that b'C(l) — 0'. If and only ifb belongs to the subspace, b'x, is stationary and b'x, = 0'. Finally, if C(l) iszero matrix, the identity (12.4) shows that C (L) has a factor, (I — IL). In thiscase A in (12.1) is cancelled from both sides, and x, is stationary so that b'x,is stationary for any b. Moreover, x, = 0.

The b that makes b'x, stationary is called co-integrating vector, and the vectorspace of co-integrating vectors is the co-integration space. It is the left null spaceof C(l). If the space has r-dimensions one can select an r x k matrix, B', thatconsists of r linearly independent co-integrating vectors. The r is called the co-integration rank. B' may be called a co-integration matrix, but B' is not uniquebecause FB' is also a co-integration matrix in so far as F is non-singular.

If B' is a co-integration matrix we have a set of relations among long-runcomponents,

B'x, = 0,

so that B' may be thought of as representing the long-run relationship amongelements of xt. This idea is further developed later in Section 12.2.2 inconnection with the error-correction representation. It should be pointed out,however, that a unit vector like (0, 1, 0 , . . . , 0) is not precluded from b. Ifb = (0, 1, 0, . . . , 0), then the second element of Ax, has no long-run component,but this vector b cannot be called a relationship.

I shall often use the expression that the co-integration space 'annihilates' or'nullifies' the stochastic trends of xt.

Page 151: Hatanaka econometry

138 Co-Integration Analysis in Econometrics

The covariance matrix of Ax,, i.e. the long-run covariance matrix isC(l)EeC(iy, or C(1)C(1) ' if £e is normalized to Ik. Since Se is posit ivedefinite the rank of C (1 )EEC (1)' is equal to the rank of C (1), which in turn isequal to k minus the rank of the co-integration space. Unless the co-integrationspace is null the long-run covariance matrix is singular, indicating that theelements of the long-run component vector, A.xt, are linearly dependent.

Let us consider identification of the co-integration space and rank. Admittedlythe Cs are not uniquely determined by the second-order moments of {Ax,} evenwith S£ = Ik, but, as indicated in Section 11.3 above, different matrices ofC(l) that produce the same second-order moments are related through C(\)Hwith an arbitary orthogonal matrix H. The left null space of C (1 )H is invariantthrough all Hs. Therefore the co-integration space is uniquely determined fromthe second-order moments of (Ax,}. The co-integration space is identified, andso is the rank.

12.1.2 Common trends

The above definition of co-integration leads us to the concept of common trendsdue to Stock and Watson (19886). Suppose that the co-integration rank is r.Construct a k x k orthogonal matrix H = (Hi, H2) such that (i) H\ is k x r andH2 iskx(k-r) and (ii) C(l)H{ = 0. Then C(l)e, = C(\')(HiH{+H2H{)es =C(\)H2H28S. Since {«,} is i.i.d. so is [H2ss}, but#2'es has only (k — r) elements.Let & = H2BS. From (12.5)

(12.6)

The stochastic trends/, = E\l-s = H'2^\es have only (k — r) dimensions, andthey are allocated among k dimensions of x, through C(\.)H2. In other wordselements of x, contain different linear combinations of the same (k - r) elementsof vector common trends/,. If Ets't) is normalized to /*, £"(f,fj) = Ik-r> ana> the(evector common stochastic trends/, consist of (k—r) mutually uncorrelated scalarcommon trends, which would perhaps facilitate the interpretation. Comparing(12.6) and (12.5), and noting (12.2'),

which shows how the long-run component, x,, and common trends, /,, arerelated. (If /o(C(l)) = k, i.e. r = 0, k variables share k stochastic trends sothat the word, 'common', loses its real meaning. This is the situation wherex, is not co-integrated.) See also the representation of common trends given inexpressions (12.25) and (12.25') below, and a summary presentation of King etal. (1991) introduced in Section 15.5. In fact this aspect of co-integration hasmotivated a large number of empirical applications of the cointegration analysisas shown in Section 15.5.

Page 152: Hatanaka econometry

For b to be a co-integrating vector b'x, must be stationary, and in particular mustnot have a deterministic linear trend with a non-zero slope. We now investigatek x(fc-l- l) matrix, [/*, C(l)]. If p([p, C(l)]) = k, there is no b such that Vt\L =0 and b'C(l) — O.1 Thus b'x, cannot be stationary for any b. If p([p, C(l)]) =k — r, there exists the r-dimensional co-integration space of b such that b'tfj, = 0and b'C(l) — 0 so that b'x, is stationary. Note that E(b'x,) = E(b'x0) = aconstant for all t and so is the variance of b'x,.

One might like to consider separately b'n = 0 and b'C(\) = 0. In theterminology of Ogaki and Park (1992) the existence of b such that b'C (1) = 0 iscalled 'stochastic co-integration', and the existence of b such that b'[fj,,C(l) =0 'deterministic co-integration'. The b in the stochastic co-integration makesb'x, trend stationary but not necessarily stationary. The co-integration space inthe deterministic co-integration annihilates the deterministic and the stochastictrends at once, while the co-integration space in the stochastic co-integrationnullifies only the stochastic trends.

The second method to introduce a non-zero mean of Ax, is to let the inno-vation have a non-zero mean, //,, i.e.

(12.8)

(12.7)

which leads to

Co-Integration and Economic Theories

12.1.3 Deterministic linear trend

139

So far we have considered the case where E(Axt) = 0, and the non-stationarityhas taken the form of stochastic trends. There are two different ways to introducenon-zero mean in A*,. The first is

which gives

(12.9)

(12.10)

It is this form that we obtain when a VAR is inverted. Here we observe theMA counterpart of what Johansen (199la) emphasized in relation to the VARrepresentation of co-integration. Note how (12.8) and (12.10) differ in theirfirst terms of the right-hand sides. Suppose that the co-integration rank is r.Reintroducing the orthogonal matrix H = [Hi,H2] such that C(l)H'i — 0, wesee that C(l)/i = C(\)H2H2iJ,. Johansen (1991a) points out that if.H^fi = 0the linear trend is absent in each element of x, in spite of the presence of

1 In connection with (12.7) Stock and Watson (19886) seem to state that V[fi, C(\)] =0 impliesthat fi belongs to the column space of C(l), but this is incorrect. A counter-example is that k = 3,the second and the third columns of C(l) are both some scalar multiples of the first column but ftis not. The co-integrating space is one-dimensional.

Page 153: Hatanaka econometry

140 Co-Integration Analysis in Econometrics

H in (12.9). In fact E(xt) = E(XQ) = a constant vector. Another point worthmentioning regarding (12.9) is that the co-integration space can be representedby the condition b'C (1) = 0 alone rather than &'[/*, C (1)] = 0. If H^ / 0, &elements of xt share (k — r) dimensions of common trends (ffl^M + ^ilfs). whichinclude the deterministic linear trend with a non-zero slope. The co-integrationspace is the left null space of C(l), and the co-integrating vectors annihilatethe deterministic and stochastic trends at once. We can have only 'deterministicco-integration' in the terminology of Ogaki and Park (1992).

I shall adopt (12.9) rather than (12.7) in the rest of Part II.Incidentally it makes no sense generally to investigate the relationship among

deterministic linear trends of various variables. For example, if {*;,}, i — 1,2,has a linear trend a, + bit, the two linear trends are perfectly correlated unlessone of the bs is zero.

12.2 Granger Representation Theorem

So far the co-integration has been denned in the framework of the VMA (vectormoving average) representation of Axt, assuming that the order of integrationis at most unity. An important role that co-integration plays in econometrics,however, lies in its representation in the VAR and especially the error-correctionform. The singularity involved in the co-integrated VMA makes its inversioninto VAR non-trivial. It is the Granger representation theorem in Engle andGranger (1987) that shows the equivalence among the VMA, the VAR, andthe error-correction representations of co-integration. Here I shall follow thedemonstration of the theorem in Engle and Yoo (1991) (attributed to the Ph.D.dissertation of Byung Sam Yoo) rather than the original one in Engle andGranger (1987). Moreover, I shall concentrate on deriving the VAR and theerror-correction representation from the VMA, thus omitting the reverse direc-tion of derivation.2 Initially fi is suppressed to zero.

A factor that enhances the importance of the error-correction representationis that it is this representation in which Johansen (199la) develops an asymp-totically efficient inference procedure. This will be explained in Chapter 15.Because of this it is easier to perform the inference in the error-correction forminitially, and, if necessary, to translate the results back to the VMA representationvia (12.20'), (12.25), and (12.25') below.

Mathematics of the polynomial matrix is explained in Appendix 5. Here Istart where Appendix 5 ends. The x, in Appendix 5 is replaced by Ax,. Thusthe model of Ax, is a fc-elements VARMA (vector ARMA)

which is slightly more restrictive than in Section (12.1) above. We wish toconstruct stationary ARMA models of Ax, such that Ax, is 7(0) but B' Ax, is

2 See Johansen (1991a) and Banerjee et al. (1993: 146-50) for the reverse direction.

(12.11)

Page 154: Hatanaka econometry

Generally V (L) l is not a polynomial matrix but a converging power series of Lwith matrix coefficients, where the convergence is that of a decaying exponential.

Since V (L) is invertible, let us write

(12.14)

and cancelling (1 — L) involved in the last r rows of both sides, we get

(12.13a)

(12.136)

where U (L) l and V (L) are both polynomial (rather than rational) matrices,U(L) is a rational matrix, and both del[U(z)~l] = 0 and det[V(z)] = 0 haveall roots outside the unit circle. Even though U (L) and V (L) are not uniquelydetermined by C (L), the above properties of U (L) and V (L) hold for all theirrepresentations.

For any k x k matrix, say X, (X)\,, (X)2., ( X ) A , (X\2 will denote the first(k — r) rows, the last r rows, the first (k — r) columns, and the last r columnsof X respectively. Partitioning (12.126) as

(12.12a)

(12.126)

(12.12c)

Co-Integration and Economic Theories 141

/(-!) for B' with rank r. (For /(-I) see the paragraph below (2.12) in Section2.2 of Part I.) It is assumed that del [A(z)] = 0 has all roots outside the unitcircle, that del [B (z)] = 0 has roots outside the unit circle and possibly real unitroots, but that det [B (z)] — 0 has none of the complex unit roots nor roots insidethe unit circle. Moreover, we normalize SE to Ik. The presence of real unit rootsin B (z) is a multivariate extension of (2.9) in Part I having an MA unit root. The/MJS in (A5.12) of Appendix 5 are assumed to be unity. This assumption excludesthe multi-co-integration developed in Granger and Lee (1990), in which mf isat least 2foTi = k,...,k — j for some k — j that is >k — r + l.

12.2.1 VAR representation

Writing C(L) = A(L)~1B(L) in (12.11) it is seen that the co-integration wasdefined in the previous section by p(C (1)) = k — r, where r is the co-integrationrank. In transferring the result in Appendix 5 I drop ~ above U and V.

Page 155: Hatanaka econometry

142 Co-Integration Analysis in Econometrics

Thus D(L) = n0 + riiL + . . . , and

n(L)x, = e, (12.15)

is a VAR possibly with infinite orders. The equation, det[FI(£)] — 0, has rootsoutside the unit circle and (k - r) real unit roots. The rank of 11(1) is r. This isthe VAR representation of the co-integration with rank r.3

12.2.2 Error-correction representation

Since the first (k - r) diagonal elements of D*(l) are zero,

Since both U(l)~l and V(l)"1 are non-singular, ranks of (V(l)~1).2 and(tf (1)~')2. are each r. Let us write -A = (V(ir1).2 and B' = (U(l)-^2.,and remember that A is k x r and B' is r x k. Then

Because of the identity

(12.15) is written as

This is the error-correction representation of the co-integration. It is a variant ofthe VAR representation, but its role in econometrics is far more important thanthe original VAR representation.

It has been said that B'xt~\ = 0 is a set of equilibrium relations (except inthe case where a row of B' is a unit vector), and that B'x,^\ indicates deviationsfrom the equilibrium when xt-\ is observed. The matrix A may be interpreted asthe matrix of adjustment speeds. Its (i, j) element represents the speed at whichthe z'-th element of x, is adjusted from xt-\ in the light of the deviation in the y'-thequilibrium relation. The matrix A is called the adjustment matrix or the loadingmatrix. In (12.18) the adjustment speed also depends upon Ax,_i, A*,_2, . . . .The equilibrium will be further explained in connection with economic theoriesin Section 12.3.

3 Readers are recommended to study another derivation of the VAR from the VMA in the co-integrated system given in Banerjee et al. (1993: 257-60). It does not use the Smith-McMillanform, and follows closely the derivation of VMA from VAR given in Johansen (199la).

(12.16)

(12.16)

(12.17)

(12.18)

Define

the are

Page 156: Hatanaka econometry

Co-Integration and Economic Theories 143

The rank of 11(1) has an important role. Suppose that 11(1) has its full rank,k, i.e. r = k. Then both A and B' are k x k and non-singular, and D*(L) = Ikin (12.I3b). The det[II(z)] vanishes only outside of the unit circle. Therefore(12.15) is a stationary VAR for xt. In fact, since A and B may be replaced byAF and BF'~l with any non-singular F (keeping AB' invariant), B' may be setto I k. Then B'x, being stationary means x, being stationary.

Going to the other extreme on the rank of 11(1), suppose that 11(1) = 0. Thenboth A and B are zero, and D*(L) = diag [(1 - L ) , . . . , (1 - L)]. (12.18) isreduced to a VAR for A*r. SinceIt is seen that F(L) is a converging power series of L with convergence as fastas a decaying exponential.

is a stationary VAR for Ax,. Its long-run covariance matrix is r(l)~1Eer(l)~1 ' ,which is non-singular. {xt} is not co-integrated.

Let us consider the intermediate case between the above two extremes, 0 <p(Il(l)) = r < k. A has r columns, and B' has r rows. The last r rows of(12.13a) are

Both U(L)~l and V (L) are polynomial matrices. Writing B(L)' = (t/(L)~1)2.and B(L)' = B(l)' + (l -L)B*(L)', it is seen that the left-hand side of (12.19) isB (1 )'*, +B * (L)' AJT, . Since the integration order of xt is at most unity, B * (L)' AJis stationary. So is the right-hand side of (12.19). Therefore B(l)'x, is stationary,and B(l)' is a co-integration matrix. JB(1)' is what was written B' earlier. Theco-integration rank is p(Il(l)).

An expression that appeals to our intuition is that in the absence of co-integration the number of unit roots equal to that of variables (Axt being astationary VAR) while in the co-integration the number of unit roots is reducedfrom that of variables by the co-integration rank.

Simulation studies often require small examples of finite order VARs thathave specific co-integration properties. The following examples are presentedalso as exercises of the above mathematics.

Example 1. Suppose that we are requested to provide a VAR model of(xit, X2t, XT,,) such that x\,, x2t, and x^, are each 7(1), x\t and X2, not co-integrated,but x\t, x2t, xj, co-integrated. The co-integration rank is 1 so that in (I2.l3b)

D*(L)= d i ag [ ( l -L) , ( l -L) , l ] .

Triangular systems are convenient because determinants are products of diagonalelements. Set

where pij(L) is a polynomial of L. (x2 and xj do not cause x\ in Granger's sense,and XT, does not cause (x\,x2) in Granger's sense.) An important condition is

(12.19)

Page 157: Hatanaka econometry

Since E(X\,EIS) = 0 for all t and s, adding a non-stochastic multiple of x\t to xi,does not eliminate p22(L)~l ]Cs=i £2^ i-e- {•*!'} an^ fe} are not co-integrated.The third equation is

Unless

is /(O), and hence {(xi,, X2,, x^,)} is co-integrated. Note that /?33(1) 7= 0 from thoutset, but we must impose the condition that either P3i(l) ̂ 0 or ^32(1) ^ 0.

144 Co-Integration Analysis in Econometrics

that pu(L), i = 1,2,3 have all roots outside the unit circle. Also set V (L) = IT,,and E(ets't) = 73. In (12.14)

and in (12.16) and (12.16')

In (12.17)

whereThe VAR is

Its first equation is

and {xi,} is 1(1). The second equation is

The right-hand side is 7(0), and therefore {xi,} is 7(1). Assuming that e, = 0 fort <0,

Page 158: Hatanaka econometry

Co-Integration and Economic Theories 145

Readers are advised to derive C(l) through (12.12a) and confirm that thefirst and second rows are linearly independent (so that x\ and xi are not co-integrated) and that (/?3i(l), /?32(1), P33(l)) is in the left null space of C(l) asit should be.

The present example will play an important role in the co-integrated regressionexplained in Chapter 13, where regressors should not be co-integrated but aregressand and regressors are co-integrated.

Example 2. Suppose that we are asked to provide a VAR in which x\t andX2t are 7(1) but x$t is /(O), and x\t and x2r are co-integrated. The co-integrationrank is 2.

Let us use the same U(L)"1 as in Example 1. Here

{x\t} is 7(1). Since

P22(L)X2, = -p2l(L)xlt + £2t,

[x2t] is 7(1) unless p2\(l) = 0. {(x\t,X2,} is co-integrated because />2iO)*ir +P22(l)*2< is stationary. Note that />22(1) i=- 0 from the outset, but we mustimpose the condition that /?2i(l) ̂ 0. From the third equation we have

{XT,,} being 7(0) requires —p^\ (L)x\t — pyi(L}xi, being 7(0), which in turn requires

Letg = /72i(l)/P3i(l) and

Page 159: Hatanaka econometry

12.2.3 Properties of the error-correction representation

From (12.12a) and (12.12c) it is seen that

(12.21) is a part of the Granger representation theorem, and called the dualitybetween //(I) for the VAR and C(l) for the VMA representations.

Since (V(1))LA = 0 and B'(U(l))A = 0, we may write (V(l))i. s A'L and(I/(l))i. = B±. Then C( l ) = Bj_A'±. In particular p(C(l)) = k - r, whichagrees with the explanation in the Section 12.1. Since U(L) and V (L) are notuniquely determined from (12.11), it may be appropriate to re-express C(l). LeA^ and B± be any k x (k — r) column full-rank matrices such that A'LA = 0 andB'LE — 0 respectively. Different A±s are related through right multiplicationsby non-singular matrices, and so are different B±s. Construct I1*(L) by

Johansen (199la) shows that

which is apparently invariant through right multiplications of A± and B± bynon-singular matrices. Johansen (1992a) also shows that the non-singularity ofA'±Tl*(l)B_\_ is necessary and sufficient for {x,} being /(I) rather than 1(2).

Incidentally IT(L) in (12.22) is related to T(L) in (12.17) through

Since A^n(l)B± - 0 because of (12.16'), (12.20') can also be written

4 Hiro Toda has kindly suggested (12.20") to me.

146 Co-Integration Analysis in Econometrics

The above B' may be replaced by

provided that —A is replaced by

It is seen that x^t is 7(0).

(12 .20)

(12.21)

(12.22)

(12.20)

(12.20)

Since and and imply

Page 160: Hatanaka econometry

Co-Integration and Economic Theories 147

In most econometric applications an infinite power series H(L) in (12.15) isapproximated by a finite order, q. From now we free Se from the normalization,and instead normalize Ilo in I1(L) to /^ by adjusting IIo, H I , . . . , and 53£.Adopting a standard way to write a finite-order VAR, (12.15) is replacedby

and (12.18) by

The truncation of the infinite power series of L in IT(L) = V (L)~l

D*(L)U(L)~l can be justified as follows. It has been assumed that V (z) isa polynomial matrix such that det[V (z)] = 0 has all roots outside the unitcircle. Therefore in V (L)"1 = V0 + VtL + V2L

2 + . . . , {|| V (;)||} is bounded bya decaying exponential, by which I mean the existence of c such that 0 < c < 1and ||V(y)|| < c]', j = 0, 1,... U(L)~l is a polynomial (rather than a rational)matrix, andZ)*(L) is given in (12.13&). Then, in II (L) = n0 + niL+n2L2 + . . . ,it is easily seen that {||n;||} is also bounded by a decaying exponential, andIT(L) may be approximated by its truncated version. Once I1(L) is truncated asin (12.23). F(L) is also truncated as in (12.24).5

It has been noted that neither A nor B is unique. This is a nice opportunityto explain the identification problem. It is known that the Ils in (12.23) areidentified no matter whether unit roots are involved, in so far as the leadingterm Flo is normalized to /. Therefore AB' = —11(1) is identified though Aand B' are not separately. In particular the rank of AB', the co-integrationrank, is identified, which is consistent with our previous result on the VMArepresentation. Moreover, all of F], . . . , F9__i are identified.

12.2.4 Common trends once more

I briefly return to the MA representation in Section 12.1. Notice that C(L)in Section 12.2 is A(L)~1B(L) in (12.11). Once (12.11) is accepted, C(L) inSection 12.2 is identical to C(L) in (12.1) in Section 12.1. (In fact {e,} iSection 12.2 is also identical to {e,} in Section 12.1.) In Section (12.12) thecommon stochastic trends are Yl[ £s — HJ, ^j e.s. Because of (12.20') HI may

5 Engle and Yoo (1991) point out that T(L) may not be invertible. An easy way to appreciatethe point is to investigate a single equation, a(L)y, + b(L)x, = s,, assuming that a(L) =a0 + a\L + ... + apLp is invertible. Construct a*(L) and b*(L) by a(L) = a(\)L + (1 - L)a*(L),and b(L) = b(\)L + (1 - L)b*(L). We get a*(L)Ay, + a(l)y,-i + b(\)x,-\ + b*(L)Ax, = E,,which is an error-correction form. But a*(L) may not be invertible as Engle and Yoo (1991) pointout even if a(L) is invertible. Construction of a counter-example is easy. One could also usea(L) = a(l) + (1 - L)a**(L) and b(L) = b(\) + (1 - L)b**(L). We then get a form analogous tothe Bewley representation. See Bewley (1979), Hylleberg and Mizon (1989), and Banerjee etal.(1993: 54). It also has the same problem. Compare these expressions with (14.15). The presentremark should not be taken as a criticism of the error-correction model and the Bewley model. Itonly warns readers about the treatment of a*(L) or a**(L).

(12.23)

(12.24)

Page 161: Hatanaka econometry

where F2i(L) = (A'A)-'A'(r(L)C(L) - /)A±(A'1Ai)~1/2 and F22(L) =(A'A)~1A'(r(L)C(L) - I)A(A'A)-l/2. By a generalized version of Theorem11.1 in Chapter 11 it is seen that f, does not cause A/, in Granger's sense.

12.2.5 Non-zero /u.

Replace {s,} by {(/A + s,)} in (12.11). Thus (12.24) is replaced by

The only point worth noting is the one already mentioned at the end of theprevious Section 12.1.3. In fact Johansen (1991a) derives the VMA represen-tation (12.10) with (12.20') from the error-correction representation (12.26) sothat we may proceed with (12.10). As demonstrated in (12.10) fi produces alinear trend in {x,} only through tC (1 )/*. Recall that C (1)A = 0. If ju, is a linear

148 Co-Integration Analysis in Econometrics

be taken as A(A'A)~'^ , and HI as Ax(A±AjJ^1//2. The common stochastic

trends, /, = H^ Y^'i es> are also

(12.25)

From (12.20') it is seen that they enter into x, as

While B' defines the long-run equilibrium, it is B± and AX that contribute tothe formation of the common trends.

Gonzalo and Granger (1991) and Konishi and Granger (1992) note that theequilibrium error does not cause the common trend in Granger's sense. To bemore concrete, consider the r-variates vector process of £, = B'xt-\ and the(k - r) variates vector process of A/, = (A^AjJ"1/2/!'^,. Then it can be shownthat t-, does not cause A/, in Granger's sense. Suppose that E(ete't) = l^ in(12.11), and that this normalization is carried through (12.15) and (12.18). Leftmultiplication of (12.18) by (A'A)^'A' leads to

Since

is an orthogonal matrix, let u, = He,,u\, = (A±Aj_) 1/2A±e,, and u2t =(A'Arl/2A'e,. Then E(u,u't) = Ik, and a VMA representation of (A/',#)'is

(12.26)

Page 162: Hatanaka econometry

combining the deterministic linear trends and the stochastic trends. (Needlessto say this does not mean that each element of xt has the same slope of lineartrend.) Note that B' annihilates both of these trends at once.

I now comment on constants in deterministic trends.

1. The model (12.10) admits a constant in the /th element of x, only whenthe j'th element of E(XQ) is non-zero and the j'th element of C(l)/i is zero. Themodel admits a non-zero constant term in the linear trend of the j'th element ofx, only when the j'th element of E(XQ) is non-zero.

2. It is also seen from (12.10) that E(B'x,) - B'E(x0). If the equilibriumerror has a non-zero mean at the initial stage, the non-zero mean stays on forever. I shall comment on this problem in Section 12.3.2 below.

3. A constant term is identified if it is involved in a stationary process. Inparticular E(B'xt) = B'£(XO) is identified once B' is identified. See Sections11.1.2 and 15.2.1 on different ways to introduce identifiability into B'.

Johansen (\992d) shows that a modification of (12.26) enables us to incorpo-rate a co-integration space that annihilates only the stochastic trends. Introduce\L\I into (12.26) to get

Ax, = AB'x,_i + (FiL + . . . + IVjL^Ax, + po + Hit + st. (12.27)

(12.10) is replaced by

B' annhilates the stochastic trends C (1) Y^\ es> but does not annihilate the lineartrends because B'C*(L)(iit ^ 0. However, B' does annihilate the quadratictrends. If we wish {xr} to lack quadratic trends in the first place before beingannihilated by B', we must impose the condition, A'j_/*i = 0.

12.2.6 Granger non-causality

Granger non-causality was explained in Section 11.3, but the case where x\t

and X2, are co-integrated has been deferred. Suppose that k = 2 and r = 1 in(12.24):

Co-Integration and Economic Theories 149

combination of columns of A, C (\)fi = 0 and the linear trends are absent fromevery element of {x,}. Because of (12.20')

and this shows how p enters into x,. This is parallel to the way in which thestochastic common trends are formed and entered into xt. Therefore it may beuseful to define the common trends (as they enter into x,) as

(12.28)

Page 163: Hatanaka econometry

150 Co-Integration Analysis in Econometrics

The reasoning set out in Section 12.1.3 is applicable even in the present casewhere B'&x, is / ( — I ) , and xit does not cause x\t in Granger's sense if andonly if a\b2 + y^ = 0, i = 1, . . . , q - 1 and a\bi — 0. The condition can berewritten as a\b2 = 0 and y^ = 0, i = 1 , . . . , q — 1. Granger non-causalityis the triangularity of AB', FI, . . . , F g _j in the error-correction form of VAR(12.24).

12.3 Economic Theory and Co-integration

The co-integration in Sections (12.2) and (12.3) is a statistical rather than aneconomic model. This should be obvious from the explanation in Section 12.1.In Section 12.2 words such as the equilibrium and equilibrium error have beenused, but the main conclusion such as (12.18) is simply rewriting the modelin Section 12.1 in a different mathematical form. One may wonder why it hasbeen written in the literature that B'x, is the equilibrium error which drives thesystem towards equilibrium. This association between economic theory and co-integration has been questioned in some literature. Here I justify the association.

Three introductory remarks are in order. First, economic theory requires aparticular representation of the co-integrating vectors that are not unique other-wise. Let y, m, and z be the log income, log money stock, and interest rate,and suppose that y and m are 7(1) while i is 7(0). The variables are assembledin vector, (m,y,i). Since i is 7(0), (0,0,1) is a co-integrating vector. If demandfunction for money holds, (1, —1, —b) is another co-integrating vector. Anynon-singular transformations (rotations) of these two vectors are contained inthe co-integration space. But the vector that is meaningful from the standpointof the economic theory is only (1, — 1, —b). Others, for example (1, —1,0), haveno economic meaning. Nor does (0,0,1). This point will be discussed further inSections 12.3.7 and 15.2.1.

As explained in relation to (11.20) Granger non-causality of x2t upon x\t

may be examined by comparing two predictions of Ax\,. One is based upon[A*i j_ i , . . . , AjcnL and the other upon [ A j c , _ i , . . . , A*i], assuming that x0 = 0.Substituting */,,_! = J2;=i &xht-i, h — 1,2, into the first equation, it is seen that

Page 164: Hatanaka econometry

Co-Integration and Economic Theories 151

Second, it is said that the co-integration is related to the long-run equilibrium.However, what distinguishes the short-run and long-run equilibria in economicdynamics is the length of time that the system needs in order to restore theequilibria when the system is shocked once but left undisturbed afterwards. Onthe other hand, the long-run and short-run are distinguished by integration ordersin the co-integration analysis. Therefore the short-run equilibrium in economicsmay well provide a co-integrating relation as the uncovered interest parity didin many studies.

Third, from the standpoint of equilibrium analysis of economic theory the co-integration space should nullify both the deterministic and the stochastic trendsat once. The same opinion has been expressed in Han and Ogaki (1991). In thischapter I shall be concerned only with the deterministic co-integration in theterminology of Ogaki and Park (1992). We are naturally led to this restriction byintroducing the drift term n as in (12.9) and/or (12.26). It is admitted howeverthat some long-run economic relationships are thought to hold only in abstractionfrom the deterministic trends. For example, Stock and Watson (1989) investigateneutrality of money in terms of the stochastic trends only. Ogaki (1992) andOgaki and Park (1992) note a case where only the stochastic co-integration isexpected in connection with consumption of goods and the relative prices.

72.3.7 Long-run relations in economics and co-integration

Let us ask ourselves what economic sense one can make out of the co-integratingvectors assuming that the vectors may be rotated if necessary. To answer thequestion I consider a model sensible from the standpoint of economic theory,transform it in the error-correction form of the Granger representation, andexamine the correspondence between the parameters of the economic modeland the statistical error-correction form. This demonstration also provides anice exercise of the mathematics in Sections 12.1 and 12.2.

Campbell (1987) and Campbell and Shiller (1988) prove that the present-value model given in Section 1.3 of Part I necessarily involves a co-integratingvector. The proof will be given in (12.29) and (12.40) of this Chapter.6 Anotherexample of the association between the co-integration and economic theories isthe real business cycle theory. The business cycle is interpreted by the equilib-rium analysis of processes in which effects of shocks are propagated. One kindof shock is that in the growth rate of productivity, and economic theory explainshow stochastic trends are generated. This is not seen elsewhere. See King et al.(1991), of which a brief summary is given in Section 15.5.

Here I consider a model in the error-correction literature. In Alogoskoufis andSmith (1992) an economic agent wishes the target variable y* to be equal to

6 Campbell (1987) emphasizes the difference between the forward-looking nature of the present-value model and the disequilibrium adjustment of the error-correction model such as Davidson etal. (1978).

Page 165: Hatanaka econometry

152 Co-Integration Analysis in Econometrics

k + xt, where x, is exogenous, but actually adjusts the control variable yt by

(12.29)

This is a type of the error-correction model which has been developed as aneconomic model in Sargan (1964), Davidson et al. (1978), and Hendry and vonUngern-Sternberg (1980) among others. The long-run equilibrium relation is(12.30), and (12.29) has the equilibrium error at / - 1, y*^l - y,_i, on the righthand side. It is assumed throughout the present explanation that 1 > y > 0. Theft is not necessarily the discount factor,7 and if ft = y, (12.29) is reduced to anexpectation version of the partial-adjustment model,

which was dominating in econometrics prior to the error-correction model.Though embarassing it must be admitted that there are a number of different

definitions of long-run equilibrium in the terminology of economic theory. Inmy judgement economic theory often considers a hypothetical situation where x,is perfectly predictable and e\, is absent from (12.29). The long-run equilibriumrelation is the one between y, and x, that would hold in this situation as t —> oo.When xt is perfectly forecastable and e\t is absent in the model that consists of(12.29) and (12.30), the relation at a finite t is

t(12.31)

If ft = y as in the partial adjustment, (12.31) is specialized to

To those economists who consider y, = k + x, as the right long-run equilib-rium this is a defect of the partial-adjustment model because the equilibrium isunattainable.8 Let us turn to the more general error correction. If ft ^ y and ifx, = /x + ct, c ̂ 0, we have as t —>• oo,

(12.32)

If ft = 1 we attain the long-run relation, y, = k + x,, but otherwise the relationis again unattainable.

Finally, it is seen that, no matter what ft and y may be,

7 (12.29) and (12.30) should not be confused with the Euler equation in the linear quadraticmodel. A difference equation somewhat similar to (but different from) (12.29) appears in the planninghorizon, but the equation has to be solved for the immediate decision variable to get the behaviourequation there. On the other hand (12.29) itself is the behaviour equation.

8 See Salmon (1982) for more on this point.

(12.33)

(12.30)

we have as

Page 166: Hatanaka econometry

Co-Integration and Economic Theories 153

as t ->• oo if x, = p, + ct, c ̂ 0. (12.23) is obtained as follows. In the spirit ofthe undetermined coefficient method set xt = /AI + c\t and y, = ^2 + erf, andsubstitute them into (12.31). With a(z) = (!-(!- y)z)~l the coefficient of ton the right-hand side is a(\)yc\, which is c\. The coefficient of t on the lefthand is simply c2. Therefore, if and only if c\ = €2 the terms in O(t) vanishfrom (12.31), and (12.33) holds. (Though ^\ and /z2 are determined so as tomake (12.31) hold even in the terms in 0(1), they are irrelevant to the long-runequilibrium considered in (12.33).) We call (12.33) the balanced-growth versionof long-run equilibrium.9 Keep in mind the use of a(l) to get the coefficiento f t .

Now we must determine which one of the above relations is represented asthe long-run equilibrium in the co-integration analysis. Suppose that

where d(L) = d0 + d\L +... + dqLq, and d(z) = 0 has all roots outside the unit

circle so that d(L) is invertible. {e\t} has appeared in (12.29), and we assumethat {(en, s2t)} is i-i.d. with zero mean and covariance matrix I2. By virtue ofTheorem 11.1 in Section 11.3 it is seen that y, does not cause xt in Granger'ssense, and that

Substituting this into (12.29) and using (12.34) with the assumption that x, = 0for t < 0, we get

Differencing and using (12.34) leads to

where 8(L) = P(d(L) - d0) + yd*(L)L and d(L) = d(\) + (1 - L)d*(L).(12.34) and (12.35') provide a nice exercise in order to understand the theo-

retical structure of co-integration. The two equations form a VARMA for(A*,, A^)',

9 When (12.30) is replaced by y* = k + ex,, (12.33) is t~\cx, - y,). If x, = log X, andy, = log Y,, the growth rates of X, and Y, are not equal unless c = 1. Perhaps we should look fora word other than balanced growth.

(12.36)

(12.34)

(2.35)

(12.35)

Page 167: Hatanaka econometry

154 Co-Integration Analysis in Econometrics

This is expressed as a VMA,

whereTherefore

V (L) is a polynomial matrix, and so is V (L).

10 In the present case the desired result (12.37) follows directly from the construction of S(L)without a Smith-McMillan form. The matrix of the MA part of (12.36) is

and premultiplymg this by

produces

Therefore the matrix of the MA part of (12.36) is:

and a representation of the co-integrating vector is (1, -1).It follows that

(12.37)

where

Page 168: Hatanaka econometry

Co-Integration and Economic Theories 155

The error-correction representation is

(12.38)

On the right-hand side of (12.38) the 2 x 2 coefficient matrix of [x,_i, yt-\]'may be replaced by

but the one in (12.38) makes sense from the standpoint of economic theory. Thepositive value of yt-\ — xt-\ adjusts y, downward. The co-integration nullifiesthe deterministic and the stochastic trends at once. The long-run relationship isyt-i = jc,_i. It turns out that the first equation of (12.38) is a reproduction of(12.34), and the second equation is (12.29) rewritten with the aid of (12.34)except that k is missing from (12.38). In fact k has disappeared at (12.35')-

The economic model (12.29) that incorporates the partial adjustment as aspecial case (ft = y) has been expressed in the error-correction form in theterminology of co-integration analysis.

It was seen that economic theory had a number of versions of long-runequilibrium relations different from the co-integration vector (1, —1). The onethat comes closest to this is (12.33), the balanced-growth version of long-runequilibrium. It contains no information either about k or indicative of whetherft = or / Y.

If we had adopted the present-value model instead of (12.29), (12.30) in theabove consideration, we would reach the same conclusion that it is the balanced-growth version of the long-run equilibrium that is dealt with in the co-integrationanalysis.

Page 169: Hatanaka econometry

156 Co-Integration Analysis in Econometrics

12.3.2 A short-cut alternative to 12.3.1

Suppose that {u,} is a stationary scalar process with zero mean, v, =S.Ui MJ> a(L) is a scalar infinite power series of L, and a(L) = a(l) + (1 —L)a*(L). Then note that (a(L) - a(l))v, is 7(0) because (a(L) - a(l))v, =a*(L)u,+ initial terms.

Consider the economic-error correction model, (12.29) and (12.30). Earlieran undetermined coefficient method was applied to the deterministic version,(12.31). A method parallel to it is available for the stochastic model. Substituting(12.30) into (12.29) we get

(12.39)

Let a(L) =!-(!- y)L. Then a(\) = y. The part of 7(1) on the left-hand sideof (12.39) is yyt because (a(L)-a(l))yt is 7(0). The 7(1) term on the right-handside is yxt, because E(x,\I,-i)—x,-i is generally 7(0). The 7(1) terms dominatethe 7(0) terms. The long-run relation is t~l/2(xt - y,) -> 0. This shows why itis the balanced-growth version of long-run relations that correspond to that inthe co-integration analysis. In the present model the same relation holds on thedeterministic and stochastic trends.

12.3.3 Parameters in the economic model and the co-integration model

As explained in Section 11.1.2 a (vector) parameter is identified in a model whena given probability distribution of observations uniquely determines the valueof the parameter that produces the distribution in the model. In the model thatconsists of (12.29), (12.30), and (12.34) (d0, di,..., dq) is identified in (12.34),and (y, ft) in (12.29). However, k is not identified asymptotically as x, and y,are both 7(1). This is seen from the Fisher information of k failing to expand toinfinity as T does. The Fisher information, which is (—1) times the expectationof second-order derivative of log-likelihood function, indicates a response ofprobability distribution to an infinitesimal difference in the parameter values.

Let us then consider how we can determine y, ft, and (do, d\,..., dq) from theprobability distribution of observations {(xt, y,)} in the statistical error correctionmodel, (12.38), or, putting it equivalently, from the identified parameters of(12.38). This would also clarify how the parameter of the economic model canbe recovered from that of the statistical error-correction model. The identificationin the general error-correction form (12.24) has been considered at the end ofSection 12.2.3. The co-integration rank is identified so that y is determinedas unity. Therefore A is a — (a\, a-fj and B' is b' = (b\, bi). A and B arenot separately identified but AB' is. Therefore a\b\,aib\,a\b-L, and 02^2 areidentified, and they can be uniquely determined from the probability distributionof observations. We also know from the distribution that y, does not cause x,in Granger's sense (see Section 11.3). Therefore as shown below in (12.43)we must have a\b\ = a\bi = 0. The y can be determined from #2^2 or-a2b]. (do, d\,..., dq) and ft can be determined from F(L) as F(L) is identified.

Page 170: Hatanaka econometry

Co-Integration and Economic Theories7

12.3.4 Stability condition

In the economic theory it is important to ascertain whether a given equilibriumrelation satisfies the stability condition. LetB'x, = 0 be the equilibrium relations.In reality B'x, = u,, where {«,} is a stochastic process with zero mean,representing deviations from the equilibrium due to shocks. Suppose that ifthe shocks are absent after a time, t0, B'x, converges to zero as t ->• oo. Thismeans that the equilibrium is eventually restored when the extraneous shockscease to disturb the system. The equilibrium is then said to be stable.

The equilibrium B'x, = 0 does satisfy the stability condition in the error-correction representation (12.18) with /* = 0. To prove it we restart from(12.12*),

If E, = 0 after a time to, Ax;->0 as t —>• oo, where —> means convergence inprobability, because V (L) is a polynomial matrix and V (L) is expressed as aconverging infinite power series of L. In (12.19) both (V(L))2. and (£/(L)~')2.are polynomials, and (V (L))2.e, is zero for t > IQ + q, where q is the order ofpolynomial of (V (L))2.. Using B(L)' = B(!)' + (! -L)B*(L)' and the reasoning

presented below (12.19), it is seen that Ajr,-»-0 implies B(l)'x,->0 as t -> oo.In the model (12.26) with p ^ 0 there is a possibility that E(B'x,) / 0. When

shocks cease to disturb the system at time to, B'x, may converge to a non-zeroconstant, which violates the stability of equilibrium (see Section 12.2.5 above).Therefore the examination of stability requires a test for E(B'x,~) = 0 afterit is confirmed by the co-integration analysis that B'xt is stationary. For theinvestigation of E(B'x,~) B must be identified by a normalization rule or bysome constraints due to economic theories (see Section 15.2.1 below). Then theJohansen method described in Chapter 5 gives an estimate of B, B. Since B —Bis Op(T~l), we may regard it as though B = B in testing for the zero meanof B'x,. Testing for zero mean in the stationary process is a standard problem.See, for example, Fuller (1976: 230-2).

Fukushige, Hatanaka, and Koto (1994) investigate the stability of a singleequilibrium relation accounting for the possibility that the dominating root ofthe characteristic polynomial may be complex-valued.

12.3.5 Adjustment towards the equilibrium

In the model (12.29) it is the will of economic agents that restores theequilibrium. In the price model it is the arbitrage among market participants thatdrives the price system towards equilibrium. That the expectation of participantsmay also contribute to establishing the equilibrium is analysed in the literatureof the theory of rational expectations (see, for example, Pesaran (1987, chs.3,4)) and also in the econometric literature such as Pagan (1985) and Nickell(1985). As explained in Section 11.3 Granger causality is an important concept

157

Page 171: Hatanaka econometry

158 Co-Integration Analysis in Econometrics

in discriminating different channels of contributions to the prediction of multipletime series. Granger (1986) and Campbell and Shiller (1987, 1988) reveal animportant Granger-causality relation that contributes to the movement towardequilibrium in the co-integrated VAR. The results in these studies are nearlyidentical, and I shall present here Campbell and Shiller (1987, 1988).

Campbell and Shiller (1988) begin by demonstrating that the general present-value relationship between x and y is a co-integrating relation. Let

(12.40)

where 1 > / > 0, and I, is the information set at t, including (xt, yt), (x,-i,j V - i ) , . . . . Both x, and y, are 7(1). It is easy to show that zr = y, — 6x, is /(O),because

(12.41)

It should be obvious that the extreme right-hand side of (12.41) is a stationarystochastic process, but I will explain it in some detail. Note first that (12.40) is astochastic process because the conditioning variables in I, are random variables.A bivariate stochastic process, {(A*,, Av,)} is generated by (12.1). {e,} is a twoelements i.i.d. process with zero mean and covariance matrix /2. The Cj s areeach 2x2, and let C jy be the first row of Cj, j = 0, 1 , . . . .1 assume c in (12.40)is zero for simplicity.

which is stationary.Campbell and Shiller (1988) show that except for a pathological case z does

cause AJC in Granger's sense, i.e. the equilibrium error has a net contributionbeyond the past AJC to the prediction of future AJC, Since Ay, = Az, + #AJC,, ththree-elements vector process {(Ajc,, Ay,, z,)} is generated by the two-elementsvector i.i.d. {s,} as follows.

Suppose that zt does not cause AJC; in Granger's sense in the bivariate system{( AJC(, z,)}. Then by virtue of Theorem 11.1 in Section 11.3 either all of the first

Then

Therefore

Page 172: Hatanaka econometry

Co-Integration and Economic Theories 159

elements of C\Q, c\\,... may be chosen to be zero, or all of the second elementsof CIQ, e n , . . . may be chosen to be zero. Then {(Ax,, A.y,, z,)} is driven by ascalar i.i.d. process. The conclusion is that either z causes Ax in Granger's sensein the bivariate system or else z and Ax are essentially the same process.11

Ignoring the latter possibility we may say that the equilibrium error sets fortha movement toward the equilibrium through the expectation of future x. Earlierin Section 12.2.5 it was shown that the equilibrium error has no influence uponthe common trends.

An interesting feature of the co-integration is its relation to the efficient markethypothesis, more precisely the martingale property (see Section 1.2.4). The result(12.18) states that if {x,} is 7(1) and at least one co-integrating vector exists,

where 11(1) = — AB' is not zero. Granger (1986) points out that this meansE(A.x,\x,--\,x,-2,...) ^ 0, which in turn means that {x,} is not a martingaleprocess. The price system in the efficient market cannot be co-integrated. Myverbal explanation is that the efficient market completes the required adjustmentwithin the same time unit so that there is no need to drive the system towardsthe equilibrium in the next period.12 See Baillie and Bollerclev (1989), Hakkioand Ruch (1989), and Copeland (1991) for the applications to foreign-exchangemarkets and subsequent discussions in Diebold, Gardeazabal, and Yilmaz (1994)and Bailie and Bollerslev (1994). Doubts are cast on co-integrations amongdifferent daily spot nominal exchange rates.

Consider the adjustment towards the equilibrium when a sort of exogeneityis involved. Suppose (i) that Ax, is partitioned as Ax', — (Ax'\t, Axf2,) with(k - r) and r elements in Ax\, and Ax2t respectively, (ii) that both A(L) anB(L) in (12.11) have the same block lower triangularity,

and (iii) that Ss = Ik. (This means that n0 is freed from the normalization.)Then C(L), U(L), and V(L) also have the same structure, and Ax2 does notcause Ax\ in Granger's sense. If this partitioning coincides with that in D(L) in(12.12c), x\t and x^t are co-integrated and B'x, is stationary. A = —(V (l)""1)^

11 An analysis in Johansen (1992a) can be used to show how this reasoning can be generalized.B'xt does cause at least one element of xt in Granger's sense unless A'H*(L)B^ = 0, wheren*(L), A, and B± have been defined in Section 12.2.

12 The theory of efficient market contains a number of aspects, some of which are not contra-dictory to the co-integration. See Dweyer and Wallace (1992) and Copeland (1991).

Page 173: Hatanaka econometry

160 Co-Integration Analysis in Econometrics

is now in the form

The equilibrium errors B'x,^\ do not subject A*i to the adjustment processtowards equilibrium. The adjustment is made through A*2 alone as it should be.Moreover, I1(L) and F(L) are block triangular in the form of (12.42). The pointwill be further discussed in Section 15.4 with a definitive concept of exogeneity.

12.3.7 Stationary elements and the long-run relationship

As was indicated in Section 12.1 above an element of {x,} may be /(O) eventhough {x,} is 7(1) as a vector process. In such a case a unit vector is a co-integrating vector, but it cannot represent any relationship whatsoever, longrun or not. We wish to discern long-run relationships from such co-integrating

where b appears at once as the long-run and the short-run parameter. In sucha case b is the long-run parameter, and there is no short-run parameter. Moregenerally the two groups of parameters may be functionally related in part, anda reparametrization is advised in Sections 14.1 and 14.3 below.

(12.43)

12.3.6 Long-run parameters and short-run parameters

Classification between the long-run and the short-run parameters is vague. Edefinitely belongs to the long-run parameters as it defines the long-run equilitrium relation. The adjustment matrix A is important in the movement towarcequilibrium, and it contributes via Aj_ to the formation of common trends £seen from (12.25). On the other hand the T s in (12.24) definitely belong to thshort-run parameters as they describe the short-run dynamics.

In the model that consists of (12.29) and (12.30), of which the error-correctioform is (12.38), y is an element of the adjustment matrix, but ft is related ta short-run parameter. Thus what discriminates the partial adjustment and theconomic error-correction models, i.e. ft — or / y, is related to a short-ruparameter.

In the example (12.38) the long-run and the short-run parameters are sepaiated, but the separation may not be complete. In fact a simple model

has the error-correction representation

Page 174: Hatanaka econometry

Co-Integration and Economic Theories 161

vectors when 7(0) variables are included in {*(}.13 It is assumed below thatintegration orders of individual variables and the co-integration rank are known.For example, in {(xit,X2t,X3t,X4t)'}, {x^,} and {x2t} are 7(0) and {x3r} and {x*,}are 7(1). The co-integration ranks that are consistent with this set of integrationorders are r = 2 or 3. The long-run relationship is in the form c^x^, + 04X4, = 0.

Let us begin with the case, r = 2. The co-integration space is generated by

Highlights of Chapter 12

A k elements vector stochastic process is denoted by {x,}. {Ax,} is assumed tobe 7(0), i.e. {x,} is at most 7(1). Initially it is assumed that E(Axt) = 0.

1. In the VMA representation of Ax,,

13 Mototsugu Fukushige has suggested to me consideration of this problem.

where the matrix of qtj must have rank 3. The condition that qa = <?23 = ^33 = 0is excluded by this rank condition, and the long-run relationship relating {XT,,}and {jt4,} is necessarily included in the co-integration space.

The above statement on the case, r = 2, can be extended to general caseswhere the number of 7(0) variables is equal to the co-integration rank. Theremay be 7(1) variables more than 2, implying possibility of more than one long-run relationship. Exclusion of all long-run relationships from the co-integrationspace can be tested by (1) in Section 15.2.2. Needless to say that the abovestatement on the case, r — 3, can also be generalized.

The properties of long-run relationships thus admitted can be investigated bytests (3)-(8) in Section 15.2.2.

where the matrix of qij is arbitrary except that it has rank 2. When q\^ and ^23are both restricted to zero, the co-integration space does not involve {jc3,} or{x4,}, and hence has nothing to do with a long-run relationship. The propertythat #13 = ^23 = 0 is an identifiable property of the co-integration space, whichis explained in section 15.2.1. The test (1) in section 15.2.2 with H = [72, 0]'can be used to test for the property. If either one of q^ and qn, is non-zero,the co-integration space does contain the long-run relationship relating {^3,} and{x4t}.

Then consider the case, r = 3. The co-integration space is generated by

Page 175: Hatanaka econometry

162 Co-Integration Analysis in Econometrics

the co-integration with rank r is defined by p ( C ( l ) ) = k — r. Then there existsan r x k matrix B' such that p(B') = r and B'C(l) = 0. It can be shown thatB'x, is 7(0). B' is a co-integration matrix, and the vector space spanned by therows of B' is the co-integration space (Section 12.1.1). There also exists a k x rmatrix H\ such that C(\)H\ = 0 and H\H\ = Ir. Let H2 be a k x (k - r)matrix such that H'iH2 = 0 and H'2H2 = /*-/•• A representation of (k - r)dimensional common stochastic trends is f, = H'2 ^, es, and they enter into x,as r = C(l) £j es = C(1)H2H'2 Ei «* (Section 12.1.2).x

2. In the VAR representation for {*,},

the co-integration with rank r is given by ,0(11(1)) = r (Section 12.2.1). Theequation, det[FI(z)] = 0, has (k — r) unit roots. In particular, 11(1) = 0 if {x,} is7(1) and r = 0, having k unit roots, and 11(1) has its full rank, k, if {x,} is 7(0)having no unit roots (Section 12.2.2). The definitions of co-integration in 1 and2 are equivalent. (Section 12.2.1 proves only that the one in 1 implies the onein 2. The reverse implication is found in Engle and Granger (1987), Johansen(1991a), and Banerjee etal. (1993: 146-50).) There are duality relations betweenthe VMA and VAR representations,

(Section 12.2.3).3. When p(II(l)) = r there exist A and B such that both are k x r, p(A) =

p(B) = r and 11(1) = -AB'. Construct T(L) by

Then we have the error-correction representation of co-integration with rank r,

B'x,-i is 7(0), and B' annihilates the common stochastic trends (Section 12.2.2).The space of rows of B' is the co-integration space. K'x, = 0 may be inter-preted as the long-run equilibrium relation in the sense of balanced growth(Section 12.3.1). B'xt^i is then the error of equilibrium, and the equilibriumis stable in the sense in which the word is meant in mathematical economics(Section 12.3.4). B'xt^\ causes x, in Granger's sense, and this indicates a rolethat expectation may possibly play to help the covergence towards the equi-librium (Section 12.3.5). Co-integration is contradictory to some aspect of theefficient-market hypothesis (Section 12.3.5).

4. Construct II*(L) by

Let AX and B± be each k x (k — r), p(Aj_) — p(Bj_) = k — r and A'A± = 0and B 'Bj_ = 0. Then the VMA and the error-correction representations of co-integration are related through

Page 176: Hatanaka econometry

in the error-correction representation (Section 12.2.4). Common trends are notcaused by the equilibrium error B'x,^\ in the Granger sense (Section 12.2.4).

5. The co-integration space is identified (Section 12.1.1). AB' is identified,but A and B are not separately (Section 12.2.3).

6. A co-integration space that annihilates only the stochastic common trendsrequires the VMA representation of Ajc/ to be written as

(Section 12.1.3). The VAR and error-correction representations are substantiallycomplicated, but will be given later. On the other hand a co-integration spacethat annihilates both linear deterministic trends and stochastic common trendsrequires the VMA representation of Ajc,,

and the VAR representation of xt,

(Section 12.1.3 and 12.2.5). With 11(1) = -AB' the error-correction represen-tation is

Linear deterministic trends are revealed in x, only when C (l)/t ^ 0 in the VMArepresentation, or equivalently A!L\i ^ 0 in the error-correction representation(Sections 12.1.3 and 12.2.5). B'x, is 7(0) having constant means. For a studyof a long-run equilibrium relation it is necessary to confirm that E(B'x,) =E(B'x0) = 0 (Sections 12.2.5 and 12.3.4).

7. For a co-integration matrix to annihilate the stochastic common trends onlywe have to consider

This opens a possibility of quadratic deterministic trends, and we must imposeif we wish to trim the order of polynomial to unity (Section 12.2.5).

Co-Integration and Economic Theories 163

(Section 12.2.3). Common stochastic trends/, enter into x, as C (I) Y^\ e« m theVMA representation, and as

Page 177: Hatanaka econometry

13

Asymptotic Inference Theorieson Co-integrated Regressions*

Throughout the present chapter it will be assumed that integration orders ofrelevant individual variables have already been determined by the univariatemethod given in Part I. Even though a system (multivariate) method is alsoavailable for the determination, I recommend the univariate method on theground that it is more flexible in modelling the deterministic trend and short-rundynamics, which are conceptually incidental to the integration order but never-theless influence seriously the results on the integration order. Chapter 9 hasrevealed that most of macroeconomic variables are 7(1) and some may even be1(2) over the period of twenty-five to forty years in the post-war quarterly USdata. (The past studies in which the contrary conclusion was reached analysedthe historical data.) The chapter also uncovered some difficulties that we facein determining integration orders. Two important ones are (i) possible changesin the orders during the sample period and (ii) effects of some arbitrariness thatis inherent in the process of model selection for deterministic trends even whenjudicious judgement is exercised.

Co-integrating relations can be expressed as regression-like equations having7(1) variables in the regressors and regressands. I have written regression-likebecause the errors may be correlated with the regressors, in which case themodels are not really regression models. The traditional econometric theorymust be re-examined for the co-integrating relations.

When integration orders of the variables are determined, they immediatelyhave important implications on the regression model. Suppose that xt, yt, and u,are respectively independent variable, dependent variable, and disturbance in

Assume for simplicity that no deterministic trends are involved. If x, and y, are7(1) and 7(0) respectively, one can immediately see by matching the integrationorders of both sides that u, is 7(1) and x, and u, are co-integrated, which isperhaps not a useful model. Likewise in

* I have benefited from comments by Kimio Morimune and Hiro Toda on earlier drafts ofthe present chapter. Especially Hiro Toda has saved me from an error in an earlier version ofSection 13.3. Remaining errors are mine.

Page 178: Hatanaka econometry

Asymptotic Inference Theories 165

if x\t and x2t are both 1(1) and yt is 7(0), x\t and x2t must be co-integratedin so far as u, is 7(0). When the dependent variable and (a linear combina-tion of) independent variables have identical integration orders, the equation iscalled balanced in Banerjee et al. (1993: 164). One could have an unbalancedregression, but one should be aware of the implications that it entails. It seemsthat most econometric models specify the disturbance to be stationary with zeromean.

The plan of Chapter 13 is as follows. Initially in Section 13.1.1 the spuriousand the co-integrated regressions are distinguished. Then various cases of co-integrated regressions are explained in Sections 13.1.2-13.3. Most of the basicpoints can be demonstrated in Section 13.1, which deals with the case where&xt is i.i.d. without a drift. Section 13.2 treats the case where A.X, is stilli.i.d. but has a deterministic drift, and Section 13.3 explains the case whereAx, is serially correlated. The remarkable result of Phillips (1991a) will bepresented with emphasis placed slightly differently from his own. The conven-tional methods of estimation and testing based upon t- and T-'-statistics are robustto all different modes of deterministic trends. The robustness to the integrationorders is confined to the case of strict exogeneity, but with the co-integrationbetween 7(1) variables the applicability of the inference procedure is extendedbeyond that in 7(0) variables with a properly selected set of independent vari-ables in the OLS. In particular we are freed from worrying about the correlationbetween the regressors and disturbance in so far as the disturbance is stationary.A summary of Section 13.1 -3 is given for a general case at the end of the presentchapter 13.

While co-integration is assumed in Sections 13.1.2-13.3, Sections 13.5-6discuss testing for the co-integration. Actually Chapter 15 is devoted to thistesting, and the method in Johansen (1991a) will be presented there.Sections 13.5-6 of this chapter consider two topics that do not fit well intoChapter 15—an a priori specified co-integrating vector and the reason whyavailable methods other than Johansen (199la) are not generally recommendedfor co-integration tests.

The above summary has skipped Section 13.4. What is presented there is thatthe co-integrated regression models assume not only the co-integration rank butalso some basic aspect of the co-integration space that is called the location ofnon-singular submatrices in B'. In fact, at least the co-integration rank can beestimated from the available data by the method developed in Johansen (1988,1991a) as will be explained in Chapter 15. However, the method has limitedcapability when applied to economic time-series data of the length actuallyavailable (see Sections 15.1.4-5 below). The rank and the basic aspect of thespace as specified by economic theories may well be accepted (unless they aredefinitively rejected by the Johansen method), thus concentrating verification tothe remaining aspects of the co-integration space. The issue is just one exampleof the basic one discussed in Chapter 11, and the discussion will be continuedin Chapter 15.

Page 179: Hatanaka econometry

166 Co-Integration Analysis in Econometrics

Readers will also find in Section 13.4.1 an application to the US post-warquarterly data of income and consumption. These variables have been found7(1) in Part I, and most of the points that I like to emphasize can be brought upin connection with this application.

The present chapter also aids the understanding of inference theories on theco-integration space in Chapter 15 after the co-integration rank is determined.Section 13.1 also contains an important point on the ordinary regression model,say, for cross-section data, that seems little known among econometricians andeven denied in many econometric textbooks.

An important topic that is left unexplored is to investigate how robust allinference methods are when the dominating characteristic root deviates fromunity to 0.9, because the roots between 0.9 and 1.0 cannot be distinguishedeffectively from unity with T — 100 as demonstrated in Chapter 8.

13.1 Pure Random Walk

A scalar Wiener process has been used in Part I. It can be extended to the vectorfc-elements process. The standard vector Wiener process w (r) is distributed inA'CO, r/,0, and wfe) — w ( s i ) and wW — w(s$) are independent if 0 < s\ <$2 < *3 < r. Suppose that E is a k x k positive definite matrix. Let E1/2 beeither a symmetric, positive definite matrix such that E'/2E1//2 = E, or a lowertriangular matrix such that E^E1/2' — E. E1//2»v(r) is called the vector Wienerprocess with the covariance matrix E. Indeed, it is distributed in N(Q, rE). Ishall use the notation, w(r\ E), with the lower-case letter w because it is astochastic vector rather than a matrix. If E is diagonal, elements of w(r, E)are mutually independent because w ( ) is Gaussian. Partition a general positivedefinite E as

where EH is k\ x k\ and k2 = k — k\. Then the following two Wiener processes,

(B.la)

2(13.16)

are mutually independent. In (13.16) the part of the last &2 elements of w (r, E)that is related to the first k\ elements are subtracted from the last fe2 elements.The Wiener process (13.16) has the covariance matrix, E22 — E2]EJ~1Ei2 (see1

Phillips 1989, lemma 3.1).Suppose that a ^-elements vector process {e,} is i.i.d. with E(e,) = 0 and

Then in the same way as in (4.2) of Part I wehave as T —> oo

(13.2)

Let

Page 180: Hatanaka econometry

Asymptotic Inference Theories 167

The k elements of r~3//2Evr are jointly distributed asymptotically the same asthe k elements of J0 w (r, ~Z)dr. Also in the same way as in (4.3) of Part I wehave as T —> oo

as T -»• oo. The I! within w( ,) will often be omitted in the following. SeePhillips and Durlauf (1986) and Park and Phillips (1988) for the above asymp-totic theory; also see Phillips and Ouliaris (1990) for other related formulae.

The following lemma is due to Park and Phillips (1988) and Park (1992).LEMMA 13.1 (i) Partition w (r, E) into w\ and W2, where w2 is a scalar

and w\ has (k — 1) elements. Suppose E is diagfEn, 022! where EH is(k — 1) x (k — 1) positive definite. Then conditionally upon G = f w\(r)wi(r)'dr,f Wi(r)dw2(r) is distributed in N(0, (*2iG). (ii) Partition w (r, E) into(w\ (r, E)', w2 (r, E)')', where w\ and w2 have k\ and £2(= k - k\) elements.Suppose that E is diag[En, E2i] where EU and £22 are respectively k\ x k\and &2 x fe positive definite. Then conditionally upon G = J w\(r)w\(r)'dr,vec / w\(r)dw2(ry is distributed in N(0, E22 ® G), where vec A is the columnvector constructed by stacking columns of A.

I shall present a reasoning that may be useful to show the plausibility ofthis lemma. Suppose that two scalar processes {s\t} and {£2,} are mutuallyindependent Gaussian, and each i.i.d with zero mean and variances a\\ and 022respectively. Let v\, = Y^'s=i £is- For fixed values of en,..., £17-, T~l Y^,vit-i£2tis Gaussian with zero mean and variance, 022 7"~2 ]C vu-i- Tfl's distributiondepends not on the values of e n , . . • , £\r-i individually but on the value ofX^vi(-i only- Therefore the distribution of T~l ^2vit-i^2t conditional upon

The exposition that follows uses the minimum number of variables necessaryto demonstrate the basic point involved in each topic. The extension to higherdimensions is straightforward in every case.

13.1.1 Spurious regression and co-integrated regression

Suppose that a bivariate process {(EU, £2;)} is i.i.d. with zero mean vector andthe covariance matrix E, which is positive definite. L

(13.3)

The k(k+1 )/2 distinct elements of T 2 Y^, vtv't are jointly distributed asymptoti-cally the same as the corresponding elements of JQ w (r, E) w (r, E)' dr. Finally,as in (4.7) of Part I

(13.4)

And and

Page 181: Hatanaka econometry

Note that (13.7') deals with T(b-b) while (13.8') refers to (b-b). In (13.8')the error of the OLS estimator does not converge in probability to zero, nor toany constant, but remains to be a random variable no matter how large T maybe.1 In (l3.T)T(b — b) converges to a random variable, and (b — b) isOp(T~l),i.e. b is nearly /"-consistent. (It should not be called 7-consistent because theright-hand side of (13.7') generally has a non-zero mean, but I shall use theterm, 7-consistency, for brevity.)

The regression between non-co-integrated 7(1) variables such as (13.6) iscalled the spurious regression. The peculiar behaviour of the OLS b when thetrue b is zero was noted in Granger and Newbold (1974) by a simulation study,and explained mathematically as above in Phillips (1986). In fact the f-statistic

1 See Phillips (1989) for a further analysis of the right-hand side of (13.8').

As for (13.8)

(13.8')

(13.7')

Let w(r) = (w\(r), W2(r))' be the two elements Wiener process with covariancematrix S. As for (13.7)

(13.8)

(13.7)

and in (13.6)

In (13.5) xt and y, are co-integrated, but they are not in (13.6).Having T observations of (xt, jt), com

In (13.5)

168 Co-Integration Analysis in Econometrics

and x, = V]t- Consider two alternative models to generate y,. One is

(13.5)

and the other is

(13.6)

Page 182: Hatanaka econometry

Asymptotic Inference Theories 169

to test for b = 0 diverges as T -> oo because if T is large the statistic isapproximately

where (T~2 ]T v2, £, and (/"-2 £ v2lt)1/2 are each 0/1). When T is large,t) 1/2

the hypothesis b — 0 is surely rejected while b is zero in the data-generatingprocess. This motivates the word, 'spurious'. Banerjee et al. (1993: 70-81) re-examine the experiments in Granger and Newbold (1974), and describe how thespurious regressions are manifested in finite samples.

A situation in which we have a spurious regression in practice is that a regres-sion is run between two 7(1) variables, while in fact three 7(1) variables includingthe two are co-integrated. Thus a spurious regression is closely related to themisspecification in regard to a list of variables to be included in a regressionequation (see Section 15.5 below).

The regression equation on co-integrated variables such as (13.5) will becalled the co-integrated regression below. It has a remarkable feature. The OLSis not just consistent but /"-consistent. This holds true in spite of the fact that theregressor ;tr and the error £2, are correlated. This result is conspicuously differentfrom the traditional econometric theory dealing with 7(0) variables. There wehave been troubled by the correlations between endogenous variables and distur-bance in the simultaneous equations and/or between measurement errors in theregressors and the equation errors, which lead to inconsistency of the OLS. The/"-consistency of OLS in the co-integrated regression was first noted in Stock(1987) and Phillips and Durlauf (1986).

How to discriminate the spurious and co-integrated regressions will beexplained in Section 13.4.3 below.

Notwithstanding the /"-consistency of OLS in the co-integrated regression,(13.7') does not have zero expectation, and in fact the bias in (13.7') may besubstantial in finite samples such as T — 100, as has been shown in Stock (1987)and Banerjee et al. (1993: 215-23). Moreover, unless the relation between x,and E2t is specialized, the limiting distribution of the f-statistic depends uponnuisance parameters, i.e. the parameters other than b. These defects of the simpleOLS in (13.7) have motivated enormous progress in the theory of co-integratedregression, and this progress will be explained in the remainder of the presentchapter. If the OLS is modified in a simple manner, we can have very simpleinference procedures for estimation and testing that use only the standard normaldistribution and x2 distributions. And problems due to correlations betweenregressors and disturbance in simultaneous equations or observation errors areall dissolved.

Page 183: Hatanaka econometry

This kind of distribution will be called the mixed Gaussian in the present book.It is a mixture of zero mean normal distributions with different variances. Inparticular £(£) = 0.

Consider the f-statistic to test for b = b°,

(13.13)

If b° is the true value of b, plim s = a-22 because plim b = b°. Thus

(13.14)

170 Co-Integration Analysis in Econometrics

13.1.2 Co-integrated regression with an uncorrelated error

In the regression model, say y, = bx, + u,,u, is said to be a uncorrelated errorif E(x,u,) = 0, and a correlated error if E(xtu,) •£ 0. In the standard regressionmodel dealing with 7(0) variables the OLS of b is consistent if the error isuncorrelated, and inconsistent if the error is correlated.

Suppose that {(e\t, £21)'} is i.i.d. with zero mean vector and covariance matrixE, which is assumed to be diag[an, 022]', in particular a\2 = 0. As before

(13.9)

which is a co-integrated regression with the uncorrelated error. The bivariateprocess (x,, yt) has a co-integrating vector (—b, 1), and we are interested in theinference about this vector.

Suppose that [(x,, y,)'} is observable, and let b be the OLS of b. Let(w](r), W2(r))' have the covariance matrix, diag[an, 022}. Setting a\2 — 0 in(13.7'), we have

(13.10)

but w\(r) and wi(r) being independent introduces further important simplifica-tion. Let us write

(13.11)

By virtue of Lemma 13.1. £ in (13.10) is distributed in N(0, 0228) conditionallyupon g. The conditional p.d.f. is

Therefore writing f ( g ) for the marginal p.d.f. of g, the marginal p.d.f. of £ is

(13.12)

and

Page 184: Hatanaka econometry

Asymptotic Inference Theories 171

Conditionally upon g in (13.11) r) is found to be distributed in N(0, 1) so that 17is distributed, so to speak, in the mixed jV(0, 1) with the conditioning variableg. However the distribution of N(0, 1) does not depend on g. In other words theconditional distribution of r\ is invariant throughout all values of the conditioningvariable g. Therefore jj and g are independent, and the marginal p.d.f. of rj is alsoN(0, 1). The ^-statistic based on the mixed Gaussian estimator is asymptoticallyN(0, 1) by virtue of the scale normalization involved in the construction of thet -statistic.

So far we have considered a hypothesis testing, but it goes without sayingthat the ^-statistic can also be used to construct a confidence interval on b. Seealso Remark 6 in Section 13.4.2.

The above result is due to Kramer (1986) and Park and Phillips (1988).The result is easily extended to a regression with k regressors. Let{ ( e i f , . . . , e/a, £(*+!)»} be i.i.d. with zero mean vector and covariance matrixE, and suppose that £ is block-diagonal, diag [En, o^+1], where EH isk x k and non-singular. The regressors, x't = (x\,,... ,xtt), are constructed byxa = X^=i £is, i = 1, • • • » k. They are not co-integrated. This is because EH isnon-singular and what corresponds to C(l) in the MA representation (12.1) isIk in the present case. The y, is constructed by

Let b be the OLS in (13.15). Let w*(r) be the (k + 1) elements Wiener processwith the covariance matrix E = diag [En, of+1], and partition w*(r) into w(r)and wt+i(r), where w(r) has k elements. Then

Conditionally upon G = (/ w(r)w(r)'dr) \f is distributed in N (0, of+1G).Consider a hypothesis, Hb° = h, where a known matrix H is k\ x k, p(H) = k\,and h is a known vector. The F-statistic multiplied by k] is

(13.16)

where X'X = ^x,x't, and s2 is the square of the standard error of regres-sion. Suppose that the hypothesis holds true. Then TH(b - ft0) coverages toa mixed Gaussian distribution, which is N(Q,al+lHGH') conditionally uponG. Moreover, T2H(X'X)~1H' converges to HGH', and s2 to of+1. Thereforethe limiting distribution of (13.16) is the mixed X2(k\) with the conditioningvariable G. Since the distribution of X2(^i) does not depend on G, the marginalasymptotic distribution of (13.16) is also X2(&i). This result can also be used toset a confidence interval on Hb. When k\ = 1 and H is a unit vector, one canuse the standard normal distribution.

Since { ( e \ t , . . . , s / a , £ (k+\ ) t ) } i s a s sumed to be i . i . d . w i th £ =diag [En,o^+1],x/ in (13.15) is strictly exogenous. Later in Section (13.3) it

(13.15)

Page 185: Hatanaka econometry

172 Co-Integration Analysis in Econometrics

will be shown that the strict exogeneity of x, guarantees the mixed x2 propertyof an estimator of b even when the serial correlations are allowed.

Needless to say that if {xt} in (13.9) or (13.15) is 1(0) and strictly exogenousthe t- and /-"-statistics are asymptotically distributed in ^(0, 1) and x2 respec-tively. Therefore it can be said that the conventional inference procedures basedon the OLS are robust to integration orders in the case of strict exogeneity.

One can further extend (13.15) to the case where y, on its left-hand side is avector rather than a scalar. However, {x,} should not be co-integrated.

A remaining problem is how to test for the absence of co-integration in{(x\t,.. .,Xkt)} and presence of co-integration in {(x\,,...,x^, y,)}. It will begiven in Section 13.4.3 below.

The main result in the present section is analogous to an important point onthe simple regression model, say, for cross-section data, which however seemslittle known among econometricians. (Indeed many textbooks state that the resultdoes not hold.) In y = Xft+u, X is n x k random matrix, y and « are each n x 1random vector, ft is k x 1 unknown parameter, and, conditionally upon X, u isdistributed in N(Q, a2In). Construct the F-statistic to test for Hft = h, whereH is ki x k. Then the marginal distribution of the F-statistic is distributed inF(k\, n —k). This is because, conditionally upon X, the /^-statistic is distributedin F ( f e j , n — k ) , but this distribution does not depend uponX. Both the confidenceinterval and significance test based upon the F-statistic are valid irrespective ofthe distribution ofX. The conditional normality assumption can be removed inthe asymptotic theory in which « is taken to infinity.

13.1.3 Co-integrated regression with a correlated error

Let us return to the case of a single regressor and a single regressand, andcontinue with {(£i,,£2t)'} being i.i.d. with zero mean vector and covariancematrix E. But here I consider the case where £ is not necessarily diagonal. Themodel is still (13.5) i.e. (13.9), which is reproduced here,

where xt = v\t = £^=i e\s- The limiting distribution of the OLS is given in(13.7').

The right-hand side of (13.7') is not mixed Gaussian. The OLS is of littleuse, but Phillips (1991a) and Phillips and Loretan (1991) present an estimatorwhich is asymptotically mixed Gaussian.2

Let CTH, o\-i, and 022 be the elements of E. Even though e2( is correlated with&x, = Eit in (13.9), we can eliminate that part of £2, which is linearly relatedto AJC, by introducing Ax, in the regressor set. In fact (13.9) can be rewritten as

(13.9')

2 I have benefited from a suggestion by Hiro Toda on the expository device in this subsection.

(13.9)

Page 186: Hatanaka econometry

Asymptotic Inference Theories 173

Writing c =; 0210-fi1, let (b, c)' be the OLS estimate of (b, c) along (13.9') withyt as the dependent variable and (x,, Ax,) as the independent variable. Let x' =(x2, ..., XT), y' = (y2, • • • , yr), Ax' = (Ax2, . . . , AxT), X = (x, Ax), s'2A =(£2.12, - - . , £2.ir), and DT = diag|T, 71/2]. Then

(13.17)

Let (wi(r), W2(r))' be the Wiener process with covariance matrix S, and let(wi(r),W2.i(r))' be the Wiener process that is derived from (wi(r),W2(r))' as(13.la) and (13.Ib). Then the process, (wi(r), W2.i(r))', has the covariancematrix, diag[<Tn, 022 - cr2i(Tn1]- Concerning the elements in (13.17), it is seenthat

Substituting these relations in (13.17) we see that

(13.18)

Since the covariance matrix of (w\(r), w2.i(r))' is diagonal, T(b - b) is asymp-totically N(0, (022 — a2\a\\)g) conditionally upon g = (Jw\(r)2dr) . Inparticular the right-hand side of (13.18) has zero expectation, while (13.7')based on the OLS (13.7) does not have zero expectation.

Thus the OLS to be run is

(13.19)

The r-statistic to test for b = b° in (13.19) is

(13.20)

where

error.

Page 187: Hatanaka econometry

174 Co-Integration Analysis in Econometrics

-£t x'-\ — (x\,..., XT-I). Then

It also follows from (13.9') that prims2 — a22 - a\\a\\ • Combining these with(13.18) it follows that (13.20) is asymptotically distributed in 7V(0, 1).

Incidentally the limit distribution regarding c also follows from (13.17). Since

Phillips (1991a) derives the above result through the maximum-likelihoodestimation of b in yt — bxt-\ + u,, xt = Y^'s=i £ i*> ut = be\t +£21- He also showsthat when the MLE is mixed Gaussian it is the optimal estimator. Its explanationhowever is beyond the scope of the present book.

Generally we have no a priori information about £ in practice. Therefore themethod in the present section rather than the one in Section 13.1.2 should berecommended.

If xt and y, in (13.9) are 7(0) and if xt and e2( are correlated, any OLSestimators, including that on (13.19), do not yield consistent estimation of b.The OLS is not robust to integration orders when the error is correlated. It can besaid that the co-integration among 1(1) variables extends the applicability of OLSbeyond that in the 1(0) variables by freeing us from worry about the correlationbetween the regressor and disturbance. The instrumental variable estimator hasbeen used for the regression model with a correlated error when the variables are7(0). The instrumental variable estimator applied to the cointegrated regressionwith a correlated error is r-consistent, but some modifications are required tomake it mixed Gaussian3 (see Phillips and Hansen (1990)).

Extending (13.9) to y, = b'xt+B2t with a fc-element vector x, is straightforwardif{xt] is not co-integrated. Moreover, y, can be extended to a vector under thesame condition.

13.1.4 An example of the simultaneous equations model4

The following example provides a number of interesting problems to consider,{(fir. £21, £31)} is i-i.d. with zero mean vector and the covariance matrix S. Let

3 Phillips and Hansen (1990) modify the instrumental variable estimator just as they modifythe OLS, which is explained in Appendix 6. Without the modification the instrumental variableestimator is not mixed Gaussian.

4 I have benefited from discussions with Koichi Maekawa and Taku Yamamoto on the writingof this subsection.

it is seen that

Page 188: Hatanaka econometry

Thenffff' =/2. Let

(13.23')

Asymptotic Inference Theories0175

(13.21a)

(13.216)

I assume that 6 ^ 0 , and ci + 6c2 ^ 0 so that xt, yt, and z, are each 7(1).(13.21a) and (13.216) contain two linearly independent co-integrating vectors,(-b, 1, 0) and (—ci, —c2, 1) in the three-elements vector process, {(x,, yt, z,)}.The two variables on the right-hand side of (13.216) are co-integrated.

(13.21o) and (13.216) may be looked upon as a simultaneous equations model.The coefficient of z, in (13.21a) is a priori specified zero. This a priori informa-tion identifies 6 in (13.21a) because mixing (13.21a) and (13.216) to get a new(13.21a) necessarily introduces z, into (13.21a). On the other hand (ci, c2) in(13.216) are not identified unless we have a priori information about the corre-lation between e2t and £3^. If 023 = 0 is known a priori, it precludes mixing of(13.21fl) and (13.216) to get a new (13.216), and (ci, c2) is identified. In fact, if023 = 0, and if x is weakly exogenous with respect to 6, c\, and c2, (13.21a) and(13.216) are what is known as the recursive system in the field of simultaneousequations, in which the OLS is consistent in the case of 7(0) variables.

Rewrite (13.21a) and (13.216) as

(13.22)

This is analogous to a reduced form. Moreover, each of the two equations in(13.22) is in the form that has been analysed in Sections 13.1.2-3. If a\i =<j13 — 0 is a priori known, the method in Section 13.1.2 can be used to estimate6 and c\ + 6c2. Otherwise the method in Section 13.1.3 should be used.

Whatever the model is, OLS calculations can be run with z/ as the dependentand (xt, yt) as the independent variables. In the present case it tries to estimate anunidentified parameter (Q, c2) in (13.216). Following Park and Phillips (1989)let us ask ourselves how (ci, c2) is related to (c\, c2). The data-generating modelis (13.21a) and (13.216). Let x, y, and z be each T x 1 vectors of the data. Then(13.216) is

13.23)

This is a nice place to introduce a transformation to canonical variables eachhaving different orders in probability, which was initiated in Sims, Stock, andWatson (1990). Let

175

Page 189: Hatanaka econometry

176 Co-Integration Analysis in Econometrics

Then | = (1+ Z>2r1/2e2 and i? = (1 + fe2)~l/2((l + b2)x + bs2). The point isthat we have transformed (x, y) into (f, >j) by using the co-integrating relation(13.21a) so that £ is 1(0) and rj is 7(1).

We are concerned with the (ci, c2) — (ci, C2). Regress z upon (£, 17) to get(Xi> fa)- Since £> is unknown (y\, fa) is not an estimator, but it is useful toanalyse (c\, c2). Since (ci, c2)' = H(y\, fa)', we have

becomes diagonal asymptotically. Therefore plim(xi — j/i) = (plim T l^'^) ',plim r-'^ss = (1 +&2)1/2<r2 - Xa) = plim (T^2r,'rirlT~2r]'22

1

63 = 0. The conclusion on ((ci — cj) , (c2 — c2)) is plim cj = c\ —t>ff23ff22 , plimc2 = c2 + o'23O'221- ̂ CTa3 = 0, the OLS (ci,c2) is a consistentestimator. (However, it is -JT consistent because y\ is.) Recall that 023 = 0enables us to identify (ci,c2). Notice also that y2 = c\ + bc2 is identifiedwithout any a priori information about 13.

It is seen that the inference theories involving 7(1) are parallel to the standardtheories for the stationary processes in so far as the consistency condition isconcerned. It is the stationary component, $-,, that brings this about.5

5 Maekawa et al. (1993) consider

where [vt] and J£ () are mutually independent and each i.i.d. with zero mean and finite variance.This is a standard econometric model except that (zd is /(I). While y, and z( are co-integrated, y t_iand z, are also co-integrated on the right-hand side of (*). The situation is analogous to (13.23).

Investigating (3.24) it is seen that

(13.24)

Note

Page 190: Hatanaka econometry

In fact

if y0 = 20 = 0. What corresponds to the stationary component, f (, in the text is here (1 + T2)"1//2wr,where r = /3(1 — a)~'. When p / 0, wt and M( are correlated. What corresponds to j/i in the textis not consistent here, which makes the OLS (a, f}) in (*) inconsistent. It has been known thatthe OLS is not consistent when p ^ 0 and (zr) is 1(0). Again it is the stationary component thatproduces the similarity between 1(0) and /(I) cases.

13.1.5 Mixing a stationary regressor in the regressor set

(13.23') is

(13.23")

Here |, is stationary while r], is non-stationary so that (13.23") is a co-integratingrelation with a vector (x, —5/2, 1) in (£r, r ) , , z t , ) w h e r e x m e a n s u n s p e c i f i e d . A sfor the coefficient on the stationary £, it is seen from (13.24) that if £(£,£3,) =0, /i is consistent, and that

which is asymptotically degenerate Gaussian. We have assumed b / 0 from theoutset.

The above demonstration provides a hint regarding how one can eliminatethe assumptions made in Sections 13.1.2-3 above that {x,} should not beco-integrated when it is a vector stochastic process. Wooldridge (1991) andHamilton (1994: 590-1) present a general proposition, which is specialized inthe present example of (13.21a) and (13.21&) to the statement that even though{x,} and {y,} are co-integrated {(c\x, + C2yt)} converges in probability to theimage of projection of {zt} on to {(xt, y,)}. Saikkonen (1993) shows a methodto estimate a general simultaneous equations model. Davidson (1994) revealsan interesting aspect of identifiability by exclusion of variables in simultaneousequations.

Asymptotic Inference Theories 177

When the identifiability is achieved by 023 — 0, the limit distribution of(ci, £2) is different from that in the 1(0) case. Since

Page 191: Hatanaka econometry

They correspond to (6.14) and (6.15) in Part I.

13.2.1 A single regressor

Previous conclusions in Section 13.1 are substantially altered when x, has alinear trend as well as a stochastic trend. Reconsider the models (13.5) and(13.6) in Section 13.1.1 replacing the data generation of xt by

(13.25)

(13.26)

13.2 Deterministic Polynomial Trends

Concerning the vector processes {s,} and {v,} constructed in connection with(13.2)-(13.4) we have

178 Co-Integration Analysis in Econometrics

is asymptotically normal. The test on y\ is standard. On the other hand, regardingthe coefficient on the non-stationary i\ we have

which is like a number of expressions that have appeared previously in thenon-stationary case.

In general, suppose that

where {xt} is stationary, but y, = Y?s=i S2.i-{(eit, £31)} is i.i.d. with zero meanand covariance matrix S, which is not necessarily diagonal. Let (c\, ci) be theOLS of (ci, c2), and DT = diaglT1/2, 7"]. Then

where x' = (xi,..., XT), y' = (yit..., yT), s'3 = (£31,..., £37-). Note that plimf-3/2x/y _ Q provided that £"(jc,£3r) = 0, i.e. if the stationary regressor isuncorrelated with the error, then c\ is vY-consistent, and

which is just the same as in traditional econometrics. On the other hand, ci isT-consistent even when E(£it^t) ^ 0. However, it is only when EfatSit) = 0that T(c2 — ca) is mixed Gaussian.

(13 .27 )

Page 192: Hatanaka econometry

Asymptotic Inference Theories 179

As before {(sn, £2;)} is i.i.d. with zero mean vector and covariance matrix E. Iuse the same definition of {(v\t, v2,)} as before. As explained in Section 4.2of Part I the deterministic part dominates the stochastic part in ^ x2

t, and

The hypothesis, b — 0, is surely rejected when b is indeed zero, and the word,'spurious', maintains its proper meaning even when a linear trend is containedin xt. Incidentally, the above also shows that it is only in the co-integratedregression that the (mixed) Gaussian property of b - b induces the asymptotic#(0,1) of/-statistic.

Let us then turn to the co-integrated regression, (13.5), to generate y, fromx,, which is constructed in (13.27). (—b, 1) is the co-integrating vector in theprocess {(xt, y,)} such that the vector nullifies both the stochastic and determin-istic trends, {(ei/, £2;)} is i.i.d. with zero mean vector and covariance matrix S.Concerning the OLS,

the deterministic trend dominates the stochastic trend in {x,}, which makes thecorrelation between Ax, and £2; irrelevant. It is seen that

(13-29)

no matter whether x, and EI, are correlated or uncorrelated. This limit distri-bution is unconditional Gaussian rather than mixed Gaussian. The asymptoticnormality of OLS in the present context has been emphasized in West (1988).The /-statistic also converges to N(0, 1) no matter whether the error is correlatedor uncorrelated with the regressor.

The co-integrating regression that nullifies both the deterministic andstochastic trends allows a constant term. When we consider

and the corresponding OLS

where W2(r) is the Wiener process with variance E(e\t) s 022- Thus the OLS is\Jj consistent and asymptotically Gaussian in spite of the spurious regression.However, the /-statistic diverges because

Let us begin with the spurious regression model (13.6) to generate y,. Then

(13.28)

(13.28)

error,

error,

Page 193: Hatanaka econometry

is asymptotically N(Q, 1).I find some wisdom in the OLS

(13.28")

In the OLS (13.28) and (13.28') with a finite sample some size of /x2ja\\ isrequired for the dominance of deterministic trend (of x) over the stochastictrend in order to make the correlation between Ax, = e\t and e^t irrelevant. In(13.28") that correlation is eliminated by A*, and thereby the ^-statistic on bconverges to N(0, 1) even when /x2/ffn is small or zero. When /n = 0 the limitdistribution of b is mixed Gaussian, while it is (simple) Gaussian when ^ ^ 0.In either case the f-statistic converges to N(0, 1). More generally, for any //,it is advisable to have AJC, in the regressor set because then the correlationbetween A*, = A/, + s\t and e-n can be eliminated even when f, does notdominate v\,.

As for the co-integration vector that annihilates only the stochastic trend, wemay consider the model

(13.5')

in conjunction with (13.27). {(EI,, £?.<)} continues to be i.i.d. with zero mean andcovariance E. If b + c/u. ^ 0, y, also contains a linear deterministic trend. Thevector (—c, 1) for (x,, yt) is the co-integration vector that annihilates stochastictrends only. There remains a deterministic trend, a + bt, in yt — cxt.

If an = 0 we may run OLS

(13.30)

It is well known that c is equivalent to the OLS estimate of c in y, = cxt-\-residual, where yt and x, are demeaned and detrended y, and x, respectively.T(c - c) is asymptotically mixed Gaussian. However, the conditioning variableis different from (13.11). Here we have

(13.31)

180 Co-Integration Analysis in Econometrics

all the above results remain unaltered except that jV(0, 3/uT2<722) in (13.29) isreplaced by N(0, 12//~2<722)' The f-statistic is asymptotically N(0, 1).

The above reasoning can be extended to a wide class of modes of determin-istic trends in x,. Suppose (i) that x, = f, + «,, where {/,} is deterministicand r~aj]/2 converges to c(> 0) as T ->• oo, (ii) that {«,} is stochastic andpossibly non-stationary, but (iii) that /, dominates u,. If [ut] is 7(1) so that ut

is Op(t1/2), then the condition (iii) means that a. > 1. Let yt — a + bx, + e2r.

Regarding b in (13.28'), Ta/2(b - b) is asymptotically N(0, c^c"1) uncondi-tionally. The ?-statistic, which is

error,

error,

Page 194: Hatanaka econometry

Asymptotic Inference Theories 181

TABLE 13.1 Various OLS with regressors containing deterministic trends

DGP of y OLS f-statistic converges toN(0, 1)

(a) y, = a + bx, + e2r yt — a + bx, + error no matter whether/, dominating vj, a\ 2 = or ^ 0

(b) yt — a + bx, + £21 y, = a + bx, + cAx, no matter whether+ error a\i = or ^ 0

(c) y, — a + bt + ex, + s2t y, = a + bt + ex, only when a\i = 0+ error

(d) yt — a + bt + ex, + e2( yt = a + bt + cxt + no matter whetherd&x, + error a\2 = or ^ 0

where w*(r) = w\(r) — f w\(r)dr - 12(r- ^) f(s - ^)w\(s)ds (see Section 6.1of Part I), and (w\(r), W2(r))' is the Wiener process with E = diag[<Tn, 022]-

The conditioning variable here is (/w*(r)2o!r) . Nevertheless the f-statisticdoes converge to N(0, I).

If cr12 ^ 0 the OLS to be run is

(13.32)

The f-statistic for c converges to W(0, 1). The results on (13.31) and (13.32) areinvariant whether the true value of ^ in (13.27) is zero or not.

The above results are summarized in Table 13.1. In (a) and (b) we areconcerned with the co-integration that nullifies both the deterministic and thestochastic trends at once. The process of &xt is A/, + s\, with some non-stochastic ft. In (a) f t dominates v\, = £^=1 e\t, but in (b) f, need notdominate v\t. In (c) and (d) we are concerned with the co-integration that nulli-fies the stochastic trends only. The deterministic trends of xt and yt must be bothlinear in so far as the trend on the right-hand side of OLS equation is kept asa + bt. Any other forms of deterministic trends would entail a bias for the esti-mation of the co-integration vector. (A more complicated form of deterministictrend can be allowed for if the trend function on the right-hand of OLS equationsubsumes the correct specifications of the models of deterministic trends in xand y.)

13.2.2 Two repressers

Next let us consider the case where two regressors x\, and X2t are involved. Theresult on the spurious regression is essentially identical to the case of a singleregressor, but the result on the co-integrated regression is more complicated thanin the case of a single regressor. Hansen (1992fl) demonstrates the'results in a

Page 195: Hatanaka econometry

182 Co-Integration Analysis in Econometrics

mathematically unified form. I shall present a more elementary exposition ofthe results in Hansen (1992a).

Suppose that the regressors are generated by

that the regression equation to generate y, is

(13.33)

(13.34)

and that {(EU, e2l, £3<)} is i.i-d. with zero mean vector and covariance matrixY^ = diag[£*, 0-33], where £* is 2 x 2. Regressors x\ and x2 are uncorrelatedwith the disturbance. The submatrix of ̂ which consists of the first and secondrows and columns is assumed to be non-singular so that x\t and x2t are notco-integrated. This is an important assumption maintained through the presentsection. In the three variables process (x^, x2t, yt) a vector (—b\, —b2, 1) nullifiesboth the stochastic and deterministic trends. We are interested in the inferenceon (b\,b2) through the OLS,

error.

To analyse (b\, b2), let us introduce a transformation to canonical variables by

(13.35)

where n = (rf + A^)1/2, £u = n, ' (Mivu + /u,2v2,), ̂ = M '(-^avi, + ̂ \v2,\and V,, = X^=i £it- The time variable t appears only in the first of two regressorsin (13.35).

(13.36)

it is seen that

Then HH' = I2. Writing (13.34) as

Page 196: Hatanaka econometry

Asymptotic Inference Theories 183

Let (WI(A-), w2(r), w3(r))' be the Wiener process with covariance matrix £ =diag[E*, 0-33]. Also write (wi(r), w2(r))H = (w\(r), w2(r)). Then

(13.37)

(w2(r), wj,(r))' is the Wiener process with the covariance matrix, diagfcr^, CT33where a22 = ̂ 2(-^2, /zi)E*(-/Lt2, Mi)' . Note that c\ - c\ is OP(T3/2) whilec2-c2 is Op(T~l). It is as though £1, is dominated out and the regressors consistof fit and i;2t.

Conditionally upon

If (d\, d2)(—(A2, fji\)' = 0, the linear combination does not include c2 — c2.The combination is OP(T~3/2). If (d\, d2)(-/j,2, fii)' ^ 0, the combination doesinclude c2 — c2, and it is Op(T~l). In this sense the asymptotic distributionof (bi — b\, k>2 — b2) is degenerate, a point that was emphasized in Park andPhillips (1988).

(13.38)

[T3^2(ci — cj), T(c2 — c2)]' is asymptotically distributed in A^(0, o^G). Keepin mind for a later reference that the asymptotic conditional distributions ofT3/2(ci - ci) and T(c2 - c2) are respectively AT(0, cr33Gn) and N(Q, o^G22}where GIJ is the (;', j) element of G.

However, we are interested in the original parameter (b\,b2) rather than(ci, c2). Consider a linear combination of (b\ —b\, b2—b2) with weights (d\, d2),

Therefore

Page 197: Hatanaka econometry

184 Co-Integration Analysis in Econometrics

Note that (d\, d-i)(—f^i, Mi) ' = 0 means that the relative weights in the linearcombination are proportional to [L\ : 1^2- Note also that ti\b\ + /ia^2 is the co-efficient of time variable t on the right-hand side of (13.34). This coefficient canbe estimated with error only in OP(T~3/2).6

At first sight one might think that the degeneracy might cause trouble in theinference. Hansen (1992a) shows it does not. Let us consider the null hypothesis,d\b\-\-dib2 = di, where (d\, d%, d^) is a priori known. It can be shown that the t-statistic is asymptotically N(0, 1) under the null hypothesis regardless of whether(d\,d2)(-ii2,ni)' ~ or ,£0. Let x'\ - (xn,..., XIT), x2 = (x2i,... ,x2T) andX — (x\,x2). The f-statistic is

(13-39)

where s is the standard error of regression in (13.34). Let DT = diagfT3/2, T],and consider

(13.40)

Write G = (DflH'X'XHD~lrl and denote its (i,f) element by Gu. In fact Gappeared in the right-hand side of (13.36). If ( d \ , d2)(—n2, AM)' ?^ 0, the secondelement of (d\, d2)HDjl dominates the first element, and T2x (13.40) is

If (di, d2)(—(j,2, /ii) = 0, the second element of (d\, d2~)HD^1 is zero, and 73x(13.40) is

Then consider

under the null hypothesis. If

6 To test if the co-integration vector (ftp b^, —1) for (x\,X2, y) annihilates the linear trend, wewish to test if — /iifc° — ̂ ib^ + ̂ — 0, where ^3 is the slope of the linear trend of yt. The f-testconsidered in the text is not directly useful because

(13.14)

and

but cannot be

Page 198: Hatanaka econometry

Asymptotic Inference Theories 185

If (d\, d2)(—^2, Hi)' ^ 0, the ^-statistic in (13.39) is looked upon as

Note that G —> G in (13.38). In either case the f-statistic converges asymptoti-cally to TV (0, l)by virtue of the asymptotic distributions of (T*l2(c\—c\), T(c2 —c2)) given in (13.37) under the null hypothesis. Conditionally upon any G thestatistic is asymptotically N(0, 1), and hence it is N(Q, 1) unconditionally aswell.

Next consider twice the F-statistic to test for (b\, b2) — (b®, b\),

By virtue of (13.37) this is asymptotically X2(2), and the degeneracy is simplyirrelevant.

If E is not block diagonal so that £3, is correlated with (eir, £2t), we can usethe method given in Section 13.1.3 above by running OLS

error.

If //2 = 0 in (13.33) so that x2t does not have a linear trend, the transformationmatrix H is reduced to I2. The above reasoning is simplified when we areinterested in the inference on b\ only. Here (d\,d2) — (1,0) in the aboveanalysis of f-statistic, and we have (d\, d2)(—^2, /u-i)' = 0 so that b\ — b\ doesnot include c2 — c2. Nevertheless the situation here is different from (13.28) and(13.29), where a single regressor contains a deterministic as well as a stochastictrend. Here G is not block diagonal, and the limit distribution of r3//2(ci — c\)is not Gaussian unconditionally.

So far y, has been generated by (13.34) so that the co-integrating vectornullifies both the deterministic and stochastic trends at once. Suppose now thatwe are concerned with a co-integration that nullifies the stochastic trend only.The model is

in conjunction with (13.33) that generates (x\t,x2t). Run OLS

The t and 2F on (ci, c2) converge respectively to A^(0, 1) and X2(2) only wheno-13 = 023 = 0. However, the t and 2F on (c\, c2) in the OLS,

it is looked upon as

Writing

error.

error,

Page 199: Hatanaka econometry

186 Co-Integration Analysis in Econometrics

are respectively N(0, 1) and X2(2) asymptotically even when a^ ^ 0 or <723 7^ 0.Throughout the above the covariance matrix of (EH, £21) is non-singular so

that x\ and x2 are not co-integrated.

by virtue of 1^2 ^ 0 and a\^ = 0. Dr(b — b) converges to a (simple) Gaussian,N(0, a^A^1). Standard testing procedures are applicable.

In my opinion we had better add A.y, in the regressor set of the OLS becausethe deterministic trend does not completely dominate the stochastic trend infinite samples.

West (1988) extends the above analysis to the case where {st} is generalizedto a stationary process {«,}. It is necessary to assume that u^,, the disturbancein (13.43), be uncorrelated with u\, = x, — /u-i not only contemporaneouslybut also in all lags and forwards. However, Ay, may be correlated with u?,t.If /j,2 / 0,Dj-(b — b) is asymptotically (simple) Gaussian. The regressor x,can be easily extended to a vector. The extension of y, to a vector necessitatesconsideration in Section 13.2.2, which deprives the model of the essence of West(1988). Stock and West (1988) apply the method to the consumption functionas formulated in Hall (1978). See also Banerjee et al. (1993: 178).

73.2.3 West model

Yet another variant of models is

where e, = (£ir, £21, £3?)', {e<} is i.i.d. with £(e,) = 0, and E(sts't) = 13 ispositive definite. The essential assumptions here are /u-2 7^ 0 and cr^ — 0.However, IJL\ may possibly be zero, and 023 may possibly be non-zero. (Thepresent model is not subsumed in Section 13.2.2 because every variable is 7(1) inthat section.) Note that x, is 7(0), while y, is 7(1) with a deterministic linear trend.Let b' = (b!,b2), and b be the OLS of b in (13.43). Let DT = diag[71/2, 73/2],and X be the data matrix of (x,, y,). Then

If /AI = 0, A is diagonal. Neither a\i nor 022 appears in A. Since ^2 ^ 0, theimpact of the deterministic trend in y, dominates that of EI, and £2r in regard to71"2 ̂ x,yt and 7*~3 ̂ yf. Moreover,

(13.43)

Page 200: Hatanaka econometry

Asymptotic Inference Theories 187

I shall not give details on quadratic and cubic trends. A number of expres-sions in Hatanaka and Koto (1994) are useful to adapt the above reasoning topolynomial trends. I point out some basic points on the quadratic trend. Earlierit was found that the OLS was vY-consistent in the spurious regression with alinear trend. With a quadratic trend the OLS is r3//2-consistent. Nevertheless the^-statistic to test b = 0 diverges when b is indeed zero in the DGP. As for theco-integrated regression the expression (13.29) should be altered to T5/2(b - b)being asymptotically normal regardless of whether the error is correlated oruncorrelated with the regressor. In the case of two regressors, the regressorsare transformed into vectors of two variables, one having t2 and another havingt but not t2. The OLS is asymptotically Gaussian instead of mixed Gaussian.The ^-statistic to test for d\b\ + d2b2 = d-$ converges to N(0, 1), and twicethe F-statistic to test for (b\, b2) = (b\, b°) converges to x2(2) under the nullhypothesis.

The conclusion of the present section is as follows. In so far as the co-integrated regression models are concerned, the applicability of conventionalasymptotic inference procedures based on t- and F-statistics (using N(0, 1) andX2) is maintained throughout all different modes of deterministic trends. Thereasons vary among different modes. In some cases the estimators are mixedGaussian, and in others the estimators are simple Gaussian. The t- and F-statistics are robust to both the integration orders and the modes of deterministictrends in so far as {x,} and {yt} are co-integrated and {x,} is strictly exogenous.Moreover, the device in Section 13.1.3 extends the applicability of t- and F-statistics to the case where {x,} is not strictly exogenous, but the device is notrobust to integration orders.

13.3 Serially Correlated Case

We now abandon the assumption that Ax, is i.i.d. Suppose that the ^-elementsstochastic process {u,} is stationary with E(ut) = 0 and E(utu't+j) = F7, j =..., -1, 0, 1 , . . . . Note that r_; = 1̂ .. Let T = £~-oo F,, which is the long-

run covariance matrix of Ax — u. Write v, = Y^'s=i us. Then instead of (13.2)we have

(13.44)

(13.45)

(13.46)

(13.4) has to be replaced by the following morre complicated formula,

and insttead od (13.3)

Page 201: Hatanaka econometry

Second, since F is diagonal, w\(r) and W2(r) are independent, and the right-hand side of (13.49) is 7V(0, Y22g) conditionally upon g = (/vvi(r)2rfr) . Witha consistent estimator of X22 to be denoted by ^22

13.3.1 Strict exogeneity

Suppose that u\, and «2r are uncorrelated not only contemporaneously but alsoin all lags and forwards. Then x, in (13.47) is strictly exogenous. First y'ff =A. 12 = 0, and (13.48) is simplified to

(13.49)

(13.49')

is asymptotically distributed in JV(0, 1), which is due to Kramer (1986) and Parkand Phillips (1988).

The OLS in the spurious regression does not converge in probability to anyconstant, while the OLS in the co-integrated regression is 7-consistent.

In the spurious regression, y, = bx, + vit we have

(13.48)

we have

188 Co-Integration Analysis in Econometrics

Compare (13.46) with (6.14) of Part I. The results in (13.44) and (13.46) arefound in Phillips and Durlauf (1986), Park and Phillips (1988), and Phillips(1988a). Keep in mind that the covariance matrix of Wiener process is thelong-run covariance matrix in all the above expressions.

Let {(MI;, M2()} be a bivariate stationary process with mean zero and auto-covariance matrix sequence {Fy}, and VH = ]T]j=i "«>' = 1> 2, an<^ xt = vi«-Let (w\(r), W2(r))' be a bivariate Wiener process with covariance matrix, F =Y^^oo F,. Also write A = £~ F7. The (i. f) element of 2 x2 matrices F, Th, and

A are denoted by y,-;, yf^\ and A.,^ respectively. In the co-integrated regression,

(13.47)

Page 202: Hatanaka econometry

Asymptotic Inference Theories 189

What has necessitated estimation of long-run variance 722 is that {u2t} is noti.i.d. This kind of problem has been dealt with in the traditional econometricsby the Cochrane-Orcutt transformation, assuming that {u.2t} is an AR (p),

(13.50)

where {£2;} is i.i.d. with Zsfer) — 0 and £(£2,) = ^22- In fact, once this assump-tion is made, two methods are available to get statistics that are asymptotically,/V(0, 1). One does not use the Cochrane-Orcutt transformation, but directlyestimates 5/22 • The other uses the transformation, but does not estimate y22.

Both methods begin with the OLS with y, as the dependent variable and x,as the independent variable. Estimates of (a\,..., ap) are obtained by fitting(13.50) to residuals {u2,}. Let a(L) = 1 — a\L — . . . — apL

p'. In the first of thetwo methods, we note that ^22 = a(\)~2a22. The term a(\) is estimated by a(l),and <T22 by the residuals in fitting (13.50) to {u2t}. The 5/22 thus obtained is thensubstituted in (13.49')-

The second of the two methods is a two-step method. The first stage is againthe OLS with y, as the dependent variable and xt as the independent variable:a(L) is obtained. The second stage is based on y* = a(L)yt, and x* = a(L)xt.From (13.47) and (13.50) it follows that

(13.51)

Write y* = a(L)y,, and x* = a(L)xt. In the second stage the dependent variableof OLS is y* and the independent variable is x*. The b* that results is ourestimate of b.

Using (13.47) it is seen that

y* = bx* + s2t + (a(L) - a(L))u2t.

Therefore

Note that

where the last equality is in part based on the strict exogeneity assumption. Itis seen that

The long-run variance of x* is a(\ )2Xn- By the assumption in the present section,{x*} and {s2t} are totally uncorrelated in all leads and lags. Let (w*(r), w^(r)) be

Page 203: Hatanaka econometry

which converges to A^O, 1) becauseThat the single step OLS on (13.47), i.e. T(b - b) in (13.49), and the second

step OLS on (13.51), i.e. T(b*—b), are identically distributed asymptotically canbe seen by comparing the expressions of two mixed Gaussian distributions. Infact this has been pointed out in Kramer (1986) and Phillips and Park (1988) inthe context of comparison between GLS and OLS. Comparison in finite sampleshas not been made to the best of my knowledge. The first of the two methodsproposed above bases its inference on T(b-b) as indicated in f-statistic (13.49')in conjunction with an estimate of long-run variance, yii- On the other hand, thesecond method avoids direct-estimation of this long-run variance, which maybe difficult with T = 100.

Extension to the case where {x,} is a vector process is straightforward in sofar as [xt] is not co-integrated. The extension to the case where {y,} is a vectornrocess will be exnlained at the end of the chanter.

13.3.2 Correlated error

Suppose that {(u\t, u2t)} is a bivariate stationary process with zero mean vector,that xt — Y?s=i "I* — V K > and that y, is generated by (13.47), in which u2t

and xt are correlated. Earlier in the case where {(u\t, u2t)} is i.i.d. introductionof AjEt in the regression (see (13.19)) enabled us to reach a mixed Gaussianestimator in spite of the correlated error. In the present case the process [u2t]can be projected on to {u\t} to get

which is N(0,a22 g*) conditionally upon g* = (J' w\(r)2dr] . The f-statisticon b in the second stage is

190 Co-Integration Analysis in Econometrics

the bivariate Wiener process with covariance matrix, diag[a(l)2yn, ayi\. Then

where {u2.\t} is stationary with zero mean but not necessarily i.i.d., and {1*2.1;}and {«!,} are uncorrelated not only contemporaneously but also in all lags andleads. In practice the leads and lags may be truncated while retaining the abovecondition approximately, so that

(13.52')

(13.52)

Page 204: Hatanaka econometry

Asymptotic Inference Theories 191

Substitution of (13.52') into our data-seneratine nrocess (13.47) yields

The ^-statistic on b is asymptotically jV(0, 1).This is a revolution in econometric theory. Here regressor x, and disturbance

uit may be correlated in (13.47), but it does not pose any obstacle to estimationof b. This is made possible by x, being 7(1). This result is due to Phillips (1991a),Phillips and Loretan (1991), Saikkonen (1991), and Stock and Watson (1993).

The x, can be extended to a vector process in so far as x, is not co-integratedwithin itself. The extension to the case where {y,} is a vector process is explainedat the end of the chapter.

Introduction of lagged and forwarded Ax, frees us from worrying about thecorrelated regressors in so far as the disturbance is believed to be stationary.

If {u2t} does not cause {MI,} in Granger's sense, Theorem 11.2 in Section 11.3suggests

(13.54)

where {M2.u} satisfies all the conditions that are placed on {M2.it} in (13.52). Inother words the leads of Ax, in (13.53) are not necessary. One would like todiscriminate u2 causing and not causing u\(= Ax) on the basis of observationsof (x,, y,). It is appropriate in practice to perform the discrimination by testingfor y failing to cause x, but the precise relation is as follows.

LEMMA 13.2. Suppose that {(MI,, W2,)} is a stationary process starting at t =- oo, and construct {(x,, y,)} for t > 1 by x, = $^'=1 u\s and y2t = bx, + u2l.(See Sections 2.2 and 11.3 for starting at t = 1 in construction of 7(1) variables.)Then

(13.55)

is equivalent to

(13.56)

Therefore we should run the OLS

(13.53)

The relevant Wiener process is (w\(r), w2\(r)) such that its covariance matrixis diag[yn, Y22.\], where yii is the long-run variance of Ax, — u\,, 5/22.1 is thelong-run variance of M2.it , and the off-diagonal elements are zero because {MI,}and [U2.it] are totally uncorrelated because of (13.52) or (13.52'). It follows thatT(b — b) is asymptotically N(0, y22.ig) conditionally upon g = (J w\ (r)2dr) .We may run here again two steps of OLS using the Cochrane-Orcutt trans-formation. The second step OLS is run alone

error.

Page 205: Hatanaka econometry

192 Co-Integration Analysis in Econometrics

Remark 1. (13.56) is the definition of y failing to cause x, but (13.55) does notnecessarily mean HI failing to cause u\. The definition of W2 failing to cause u\ is

(13.57)

Proof of Lemma 13.2. The proof uses nothing more than the exis-tence of relevant conditional expectations. (xt-i,yt-\,...,xi,y\) and(«ir_i , «2f-i, • • • , M H , "21) are related by a non-singular transformation. (Infact the Jacobian is a constant, unity.) The probability density of u\t condi-tional upon (jc t_i, yt-i, . . . , X ] , y i ) is identical to that of u\t conditional upon(«!,_], "2n, •• - , M H , M2 i) , and

Likewise we get

Therefore

(13.55) and (13.56) are equivalent. QED

Remark 2. If {(MI,, u2t)} is a stationary, linear, indeterministic process,

becomes negligible as t -> oo, and we have analogous relations on otherconditional expectations. The difference between (13.55) and (13.57) becomesnegligible. It thus holds true asymptotically that u2 fails to cause u\ if and onlyif y fails to cause x.

The Granger non-causality test in a possibly co-integrated VAR will beexplained in Section 15.4.3.

There exist a number of methods to deal with the present case of correlatedregression. The fully modified least squares due to Phillips and Hansen (1990)is given in Appendix 6. Phillips and Loretan (1991) consider methods that areanalogous to the above (13.53) but include lags of (y, - bxt) or Ayr instead ofleads of Axt. Banerjee et al. (1993: 242-52) summarize them with emphasis onthe weak exogeneity, and subject them to experimental investigations.

13.3.3 Deterministic trend

It should be obvious how the reasoning in Section 13.2 is to be modified when{e,} there is replaced by {«,}. Some summary statements are found at the endof the present chapter.

and

Page 206: Hatanaka econometry

Asymptotic Inference Theories 193

13.4 Miscellaneous Remarks including Direction ofCo-integration

13.4.1 Simulations and applications

A comprehensive simulation study of finite sample distributions of the OLS andthe estimator b in (13.53) is found in Stock and Watson (1993) for the caseswhere {x,} is strictly exogenous and also where {x,} and {u,} are correlated. Theresults for T — 100 are close to what one derives from the above asymptoticanalyses. The OLS has no bias in strict exogeneity, but has a substantial bias incorrelated regression. The estimator b in (13.53) has no bias in strict exogeneity,and its bias in correlated regression is much less than that of the OLS. Increasingthe orders of leads and lags of Axt in (13.53) reduces the bias, but increasesthe dispersion. The selection of these orders does pose a problem. It is worthnoting that in the theoretical distributed lag relation (13.54) the lag order of u\terminates at q if {(MI,, u^} is a VMA with order q, but the order extends toinfinity if {(u\t, u^} is a VAR.

An application of b in (13.53) is made on the demand for money in Stock andWatson (1993). They analyse the historical data, seeking for a stable long-rundemand function.

The following application has been provided by Mototsugu Shintani andYasuji Koto. It is concerned with a relation between the log of real incomeper capita, x, and the log of real consumption per capita, y, in the seasonallyadjusted quarterly US data over the period, 19471-19931. T = 185. The time-series charts are presented in Figure 13.1. It is generally thought that incomeand consumption are mutually interdependent so that in

y, = a + bx, + u,

x, and ut may well be correlated. In fact {x,} has a deterministic trend, and as itdominates the stochastic trend in {x,} asymptotically, the /-statistic on b in the

FIG. 13.1 Consumption and income per capita

Page 207: Hatanaka econometry

194 Co-Integration Analysis in Econometrics

FIG. 13.2 A tree o f ( q i , q 2 )

OLS as simple as

is asymptotically 7V(0, 1). However, in finite samples the dominance is notcomplete, and we are thus led to the OLS,

We are concerned with the co-integration in which the deterministic andstochastic trends are eliminated at once so that the time variable is not introducedin (13.58). It is not necessary to specify the model of deterministic trend.

The null hypothesis that y does not cause x in Granger's sense has beentested with a VAR containing a constant term, and the hypothesis has beenrejected even at 1% significance level.7 This means that leads of AJC are neededin (13.58). The general-to-specific principle in Mizon (1977) and Hendry (1979)has been used to select q\ and q2. Assuming that neither can exceed 8, the treediagram such as Figure 13.2 is formed. Starting from the top, we step down whenand only when the highest-order coefficient c-q\ or cq2 is not significant at 1%significance level. (If there is a serial correlation in residuals the significanceshould be judged in the equation with Cochrane-Orcutt transformed variables.)We have settled at (q\, q2) = (5, 5). With this choice of (q\, q2) (13.58) has beenestimated with Cochrane-Orcutt transformation. The f-statistic to test b = 1 is—3.72. The long-run elasticity of consumption to income is significantly lessthan unity.

13.4.2 Co-integration space implied by a co-integrated regression

In comparing (13.5) and (13.6) I have emphasized the importance of discrimi-nating the spurious and the co-integrated regressions. Nevertheless my writingin the rest of Section 13.1 simply assumed that x and y are co-integrated. Notingthat the co-integration rank is 1 in the co-integrated regression (with a scalar y)and zero in the spurious regression, I might rephrase the above assumption to

7 Incidentally it has also been found that income does not cause consumption in Granger's sense,which reconfirms a point that Hall (1978) discovered in connection with the rational expectationshypothesis.

(1358)

error

error.

Page 208: Hatanaka econometry

Asymptotic Inference Theories 195

say that the co-integration rank has simply been assumed rather than estimatedby the data.

In fact, what has been assumed is not just the co-integration rank. Thefollowing point is due to Johansen (1992a). Consider a regression equationin which both the dependent and independent variables are 2-elements vectors,

(13.59)

where «', = (MI,, uit) is a stationary process with zero mean. One would natu-rally interpret (13.59) to mean that the co-integration rank is 2 in { ( x \ t , x2t, y\t,y2t)}, and that, writing

[—C, 12] is a representation of B' in the error-correction representation, (12.18).The point to be emphasized is that the co-integration space is not just any oneof the two-dimensional vector subspaces in the four-dimensional whole space.Denoting by B2 the 2 x 2 submatrix of B' that consists of its third and fourthcolumns, left multiplications by any 2 x 2 non-singular matrices F never turnsB2 to a singular matrix. Therefore, a vector such as (a, b, 0, 0) cannot be aco-integrating vector. Neither x\t nor Kit is /(O), and no linear combinations ofx\t and x2t are 7(0).

Let us consider this point in the general framework of Section 12.1. I repro-duce the main points of the framework. {Ax,} is a ^-element vector linearprocess, Ax, = C(L)e, (n is suppressed to zero). The co-integration rank isr so that p(C(l)) = k — r. An r x k matrix B' is a co-integration matrixso that B'C(l) = 0. Since p(B') — r, there must be at least one r x r sub-matrix of B' that is non-singular. Knowing which rows and columns of B' areincluded in such a non-singular submatrix will be called knowing the locationof a non-singular submatrix. Then the following proposition characterizes theco-integrated regression model.

LEMMA 13.3. Knowing the co-integration rank r and location of a non-singular rxr submatrix of B' in Section 12.1 induces a co-integrated regressionmodel with (k — r) non-cointegrated regressors and r regressands.

Proof. For convenience of exposition let the non-singular submatrix be inthe extreme right position of B'. Non-singularity of this submatrix is invariantthrough left multiplications of B' by any non-singular matrices. It may then beconvenient to normalize this submatrix to the unit matrix. B' is now (-P, Ir),and (-P,/r)C(l) = 0. Let {x,} be partitioned as {(x'\t, x'2r)'} with (it - r) anr elements respectively for x\, and x2t. Since

Page 209: Hatanaka econometry

196 Co-Integration Analysis in Econometrics

is non-singular, the rank of

must be equal to p(C(l)) = k — r, and (/t_r, O)C( l ) has its full row rank.From the reasoning about (12.5) it is seen that {jcir} is not co-integrated, andthat {(—P,Ir)xt} is stationary.

Remark 1. Location of a non-singular submatrix is an identified property ofthe co-integration space, because it is invariant through left multiplications ofB' by any non-singular matrices.

Remark 2. It is important to bear in mind that the normalization used aboveis acceptable only when the extreme right r x r submatrix is non-singular. Ingeneral, a normalization of B' can be performed by imposing B'F — Ir, whereF is a known k x r matrix with rank r, and B'F = Ir is not generally associatedwith locations of non-singular submatrices. It is when F' takes special formssuch as (0,1r) that B'F —Ir is related to location of a non-singular submatrix.It is the location of non-singular submatrices rather than normalizations that isbasic in the co-integrated regression.8

Remark 3. The above reasoning justifies the expression that the co-integrationis directed from x\ to xi- It implies some asymmetry between regressors andregressands, which resembles that between exogenous and endogenous vari-ables in the analysis of economic theories dealing with non-stochastic variables.There the exogenous and endogenous variables have different roles as one is notconcerned with relationships among exogenous variables but with relationshipsbetween endogenous and exogenous variables. (Incidentally the standard econo-metric definitions of exogenous and endogenous variables are not thought of insuch analyses.) In empirical investigations of economic theories the location ofa non-singular submatrix may be suggested by the economic theories.

Remark 4. It is possible for example that the extreme left and the extremeright submatrices are both non-singular. There can be a number of differentco-integrated regressions with different sets of regressors, all of which arederived from the same model representation in Chapter 12.

Remark 5. As will be explained in Chapter 15 any normalization B'F = Ir

identifies B'. Thus P in (-P, Ir) is identified.Remark 6. {x\,} must be non-co-integrated 7(1), but some or all elements of

[X2t] may well be 7(0). And, if the ith element of {X2,} is 7(0), the ith row of Pis a zero vector. The case has been excluded in Sections 13.1-3, as indicatedby the condition, b / 0, in (13.5) for practical considerations on the balancedregression, but the condition is not really required for theoretical considerations.

8 Phillips (1991a) develops his analysis of the co-integrated regression on the basis of a triangularrepresentation of a co-integration matrix, and Hamilton (1994: 576-7) shows how this representa-tion can be derived from the general representation of possibly co-integrated processes. The pointemphasized here is that this derivation cannot be performed unless some property, indeed locationof a non-singular r x r submatrix, is known.

Page 210: Hatanaka econometry

Asymptotic Inference Theories 197

The mathematical reasoning in (13.9)-(13.14) holds even when b° = 0 so thatyt is 1(0).

A reference related to the above remarks is Phillips (1994).

13.4.3 Relation to the co-integrated VAR analysis

The method developed in Johansen (1988, 1991a) on the co-integrated VARestimates the co-integration rank, and tests the hypotheses about the structureof co-integration space. In particular, the spurious regression (13.6) and the co-integrated regression (13.5) can be discriminated as follows. Initially confirmthat both x, and y, are 1(1) by the univariate analysis given in Part I. Theconfirmation rules out that {(x,, yt)} has co-integration rank, r, equal to 2. Thentest the hypothesis, r — 0. If it is rejected, accept the co-integrated regression.If it is not rejected, accept the spurious regression.

More generally, consider a (k + 1) elements process, {(x't, y,)}, where x, hask elements, y, is a scalar, and x't and yt are each 1(1). It is assumed that (x't, yt)contains all the relevant variables. A method such as (13.53) presupposes that{x,} is not co-integrated and also that the co-integration is from x, to y,, i.e.the last element of the co-integrating vector, is non-zero. It will be explainedin a general framework in Section 15.2.1 that these conditions (r = 1 and thedirection of co-integration) enable us to identify the normalized co-integratingvector, (b\,..., bk, 1)- The conditions can be tested as follows. Initially test thehypothesis, r = 0, in {(*',, yt)}. If it is rejected, test the hypothesis, r < 1,against r < 2. If it is not rejected, judge that r = 1, and proceed to testingthe hypothesis that the last element of the co-integrating vector is zero. SeeSection 15.2.2(5) and set c' = ( 0 , . . . , 0, 1). The conditions required for (13.53)are accepted if the hypothesis is rejected. If r = 0 is not rejected, or if r < 1is rejected, or if the last element of the vector being zero is not rejected, therequired conditions are judged to fail.

When the above scalar y is extended to a vector with r elements, the co-integrating regression with y as regressands requires that the last rxr submatrixof B' for {(x't,y't)} be non-singular. I do not find literature indicating a test usefulto discriminate singularity and non-singularity of this submatrix.

It seems that all the materials presented above in the present chapter aresubsumed into the method to be presented in Chapter 15. The reason why theyhave nevertheless been treated independently in the present chapter is that themethod in Chapter 15 requires data covering a considerably longer period thanavailable for macroeconomic studies to produce a reasonably accurate estimateof the co-integration rank.

To those readers who are engaged in empirical studies of economic theoriesmy advice is to apply first the method in Chapter 15. If the results defini-tively deny the economic theories, accept the results. Otherwise assume the co-integration rank and the direction of co-integration as specified by the economictheories, and proceed to hypothesis testing along the co-integrated regression

Page 211: Hatanaka econometry

198 Co-Integration Analysis in Econometrics

analysis given in the present chapter. Transition from the method in Chapter 15to the regression method has been suggested in Johansen (1992c), and describedin Section 15.4.1.

13.5 A Priori Specified Co-integrating Vector

Frequently economic theories specify particular co-integrating vectors. Forexample, when the theory of present-value model, (12.40), is applied to theterm structure of interest rates, the theory specifies 0 — 1 so that the co-integration is x, — yt; i.e. the short-term rate equals the long-term rate. This hasbeen investigated in Campbell and Shiller (1987). The purchasing power parityon foreign-exchange rates is another example, and it is investigated in Mark(1990) and Ardeni and Lubian (1991) with a priori specified co-integratingvector (1, —1, 1) for (p, p*, /), where p, p*, and / are each logarithms of

where {«,} is a disturbance that may possibly be correlated with {E,}. It issupposed that long-run components of x and y are related as indicated by the firstterm on the right-hand side of (13.60), while short-run components are relatedas indicated by the second term. In particular y = 0 in the permanent-incomehypothesis.

Write v, = V'_, ss. The OLS in levels of x and y is

(13.60)

I have so far explained the impact of co-integration upon the regression analysis.The co-integration relation is a relation among long-run components in /(I)variables. If one starts the analysis from the standpoint of regression analysis, itis important to keep in mind that the OLS in levels capture the relation amonglong-run components, discarding short-run components.

Ignoring initial terms, x, is decomposed into the long-run and short-runcomponents as described in Section 2.2 of Part I.

and y, is generated by

13.4.4 Long-run relation in the OLS in levels

Page 212: Hatanaka econometry

Asymptotic Inference Theories 199

domestic price, foreign price, and the exchange rate of domestic currency perforeign currency.

In terms of general notations, if we want to investigate whether or not b'is a co-integrating vector in a vector process {x,} that annihilates deterministicand stochastic trends at once, and if b' is a priori specified, we only have to seewhether {b'x,} is stationary or not. The discrimination to be sought is between thestationarity and the non-stationarity. Though the difference stationarity is a kindof non-stationarity, it is important to keep in mind that the trend stationarity isalso a kind of non-stationarity unless the deterministic component is a constant.If one is concerned with the co-integration that nullifies stochastic trends only,one should examine whether {b'x,} is DS or TS.

It is useful to proceed along the classification adopted in Chapter 8. Thestationarity is M®, where the superscript 0 indicates a constant trend and thesubscript 1 indicates absence of unit roots. In investigating the co-integrationthat nullifies both the deterministic and stochastic trends we must discrimi-nate MI from all the rest as a model of b'x,. The comparisons, Mg-o-M° and

M^oMJ, have been described in Chapter 8. The past studies seem to investi-

gate only MQ^-M} and M^—tM® by the Dickey-Fuller test, but this is evidentlyunsatisfactory.

When the stationarity of b'x, is confirmed it is necessary to test for thehypothesis E(b'xt) — 0 from the standpoint of the equilibrium analyses ofeconomic theories (see Section 12.3.4). There is a standard method for it (see,for example, Fuller (1976: 230-6)).

Part I was devoted to the discrimination between DS and TS that is neededfor the investigation of b'x, that nullifies the stochastic trends only.

We may wish to perform a joint test of co-integration for r$ multiple relation-ships with a priori specified coefficients. Extension of the above method to avector time series has not been investigated. Hall, Anderson, and Granger (1992)use the maximum-likelihood method given in Chapter 15. Initially a sequenceof tests for r — 0, < 1 , < 2, ... is performed. If r = 0, < 1 , . . . , < r0 - 1 areall rejected but r < r^ is not, we proceed to testing for the null hypothesis thatthe TO dimensional co-integration space includes the a priori specified vectors bythe method given in Section 15.2.2. It should be noted that the presence of TQco-integrating relations is evaluated in terms of its absence as the null hypothesis(see Section 15.1.3).

13.6 Shortcomings in Many Past Empirical Studies

I continue on the testing of co-integration but now with a co-integrating vectorthat is unknown. As I stated earlier this study should be made by testing for therank of co-integration along the analysis in Johansen (1988, 1991fl). However,many past studies, especially those prior to Johansen (1988, 1991a), use theaugmented Dickey-Fuller test applied to the residuals in the OLS regression

Page 213: Hatanaka econometry

200 Co-Integration Analysis in Econometrics

along a co-integrating relation. In other words, the null hypothesis is a singleequation, y, = b'x, + v,, where {v,} is 7(1) so that (y,, x',) is not co-integrated. Inthe first step the b is estimated by OLS, and the residuals, vt, are derived. In thesecond step the augmented Dickey-Fuller test statistic is computed to see if {v(}is judged 7(1). Phillips and Ouliaris (1990) derive the limiting distribution of theaugmented Dickey-Fuller test statistic and tabulate it, assuming that E(Axt) — 0in the data-generating process, while admitting demeaning and detrending of dataalong (13.30). Hansen (1992a) shows that, if the demeaning and detrending arenot performed, the two-step test is not invariant to whether E(&xt) = or ̂ 0 inthe data-generating process, requiring some adjustment of dimensionality when£(Ax,) ^ 0 (see also Hamilton (1994: 591-601) for a lucid explanation of thispoint). What is more serious is the possible presence of co-integrating relationsamong elements of x, (see Choi (1994) for the complicated limiting distributionof the test statistics). In my judgement the two-step test is applicable onlywhen (i) the co-integration rank is a priori known to be at most 1 and (ii) thecoefficient of y is known to be non-zero so that it can be normalized to 1. Thetest can hardly be useful to investigate the model specification of co-integratedregression regarding the points discussed in Section 13.4.3. 9

Shintani (1994) emphasizes that the two-step method still retains some usebecause the case where A: is a scalar variable arises often in applications. Let megive prescriptions for practitioners for this special case. {(u\t, uit)} is a stationarybivariate AR process, A.xf = (j, + u\t, and

The limit distribution of the augmented Dickey-Fuller r-statistic on the residual,y, — a — bt — ex,, is given in Phillips and Ouliaris (1990, table lie, n = 1), nomatter whether /A = 0 or -£ 0.10

9 Kremers, Ericsson, and Dolado (1992) investigate a two-step method applied to the Hendrymodel.

10 (1— R2) is Op(\) in the spurious regression while it is Op(T~l) in the co-integrated regression.The Durbin-Watson statistic is Op(T~l) in the spurious regression while it is Op(l) in the co-integrated regression. Thus R2 close to 1 is indicative of a co-integrated regression, while a verysmall value of the Durbin-Watson statistic is indicative of a spurious regression. However, thesestatistics are not useful in discriminating the spurious and co-integrated regressions because theirlimit distributions contain the long-run covariance matrix of (Ax,, Ay () as nuisance parameters.

If /LI T^ 0, the limit distribution of the augmented Dickey-Fuller f-statisticon the residual, y, - a - bx,, is that given in Fuller (1976: 373, table 8.5.2,rT, n = oo), the table for the /-statistic in the case where detrending is performed.The finite sample distribution would depend upon the size of ̂ 2 in relation to thelong-run variance of u\. If /u, = 0, the limit distribution is provided in Phillipsand Ouliaris (1990, table lla, n — I). Suppose that

Page 214: Hatanaka econometry

Asymptotic Inference Theories 201

Below follows a partial list of the literature on empirical studies of co-integration on the basis of the above two-step method; its purpose is to indicatethe fields of economics in which co-integration analysis is useful rather thanto demonstrate the results that past co-integration analyses have achieved. Theliterature on the application of the method of Johansen (1991a) is presentedseparately in Section 15.5.

Existence of co-integration between (real) stock price and (real) stock divi-dend is a piece of evidence against a version of rational bubbles. Diba andGrossman (1988) and Lim and Phoon (1991) test for absence of co-integration.The purchasing power parity with unknown coefficients has been investigatedin Taylor (1988), McNown and Wallance (1989), Layton and Stark (1990), Kim(1990), and Patel (1990). Baillie and Selover (1987) study a monetary model offoreign exchange. Meese and Rogoff (1988) examine the co-integration betweenreal exchange rates on the one hand and differentials between real interest ratesof different countries on the other. Hall (1986) investigates the relation betweenwage, price, and productivity, and Drobny and Hall (1989) analyse the consump-tion function viewed as a co-integrating relation. In all of the above studies thenull hypothesis is the absence of co-integration. Even if the Johansen (1991a)method were used, the null hypothesis would be the zero co-integration rank.

Highlights of Chapter 13

Here I present results in Sections 13.1-13.3 in a more general framework. Let{(A*/, u't)} be a jointly stationary, linear indeterministic process, and let x't andu't have respectively m and n elements. {yt} is generated from {(Ax/, «,')} by

(13.61)

where a' and B are respectively 1 x n and m x n unknown parameters. (13.61)means that {x,} and {y,} are co-integrated. If E(Ax,) ^ 0 so that {x,} haslinear deterministic trends, [—B, I ] nullifies both the stochastic and deterministictrends in (x',y'). We may also consider

(13.62)

where CQ and ci are each unknown n elements vector parameters. If deterministictrends in y, and x, are respectively d0y + d\yt and d0x + d\xt, then (13.62)implies that CQ — d'Qy - d^B and c{ = d'ly - d(xB. [-B, I] is the co-integrationmatrix that nullifies only the stochastic trends in (x',y'). Deterministic trendsare present in y't — x(B unless c[ = 0. The OLS on (13.62) is equivalent to theOLS on (13.61) in which x and y are demeaned and detrended.

The following results are based upon the assumption that {x,} is not co-integrated.

1. If {«/} and {xt} are completely independent, (the degrees of freedom times)the F-statistics on (a portion of) B based upon the two-step least squares with

Page 215: Hatanaka econometry

202 Co-Integration Analysis in Econometrics

Cochrane-Orcutt transformation in either (13.61) or (13.62) are asymptoticallyX2 (see Section 13.3.1). When {x,} contains one but only one element that is 7(1)and contains a linear or higher-order deterministic trend, the strict exogeneitycondition can be eliminated on that element (see Section 13.2.1). Though veryspecial logically, the case may be encountered frequently in econometric appli-cations.

How to implement the Cochrane-Orcutt transformation in the case where y,is a vector of n elements (n > 1) would require some explanation. Assume that

It is seen that

where vec is the column vector obtained by stacking columns. It follows from(13.61) that

The second-step estimation must be based upon the SUR (seemingly unrelatedregression) calculation with_yr'/4(L) as the dependent variable and (vecA(L)(g)jEf)'as the independent variables, where the coefficient vectors are identical amongall equations. If A(L) is diagonal, diagfan(i), . . . , ann(L)},

where B = [B\,..., Bn], and the ith equation has yitau(L) as the dependentvariable, and aa(L)x[ as the independent variable.

2. If {«(} does not cause {Ax,} in Granger's sense, (the degrees of freedomtimes) the f-statistics on B based upon the two-step least squares with theCochrane-Orcutt transformation on

(13.63)

or

y'(13.64)

are asymptotically x2- The condition that a does not cause Ax can be verifiedby testing for the non-causality of y on x. It does not matter whether {x,}has a linear deterministic trend, but no deterministic trend other than linear ispermitted in (13.64).

3. More generally, even if {u,} may cause {Ax,} in Granger's sense, (thdegrees of freedom times) the F-statistics on B based upon the two-step leastsquares with the Cochrane-Orcutt transformation on

(13.65)error

Page 216: Hatanaka econometry

Asymptotic Inference Theories 203or

(13.66)

are asymptotically x2- It does not matter whether {x,} has a linear trend, but nodeterministic trend other than linear is permitted in (13.66).

4. If {jc(} is co-integrated, there appear stationary components, and the infer-ence resembles the traditional econometric theories as far as the consistencycondition is concerned (see Sections 13.1.4-5).

My practical advice consists of one major trunk line and a number of sidelines. In the trunk line one determines initially the integration orders of {x,}and {y,}, confirms that the regression is balanced, and, if both {*,} and {y,} are7(1), proceeds to 3 above, assuming that {xt} and \yt} are co-integrated. Thisadvice is applicable even to the special case where {x,} contains one but onlyone element that is 7(1) including a deterministic trend. If the co-integration isto annihilate both the deterministic and stochastic trends, adopt (13.65). Heremodels of deterministic trends need not be specified. If it is to annihilate thestochastic trends only, use (13.66). Here it is necessary to introduce the correctmodel specification of deterministic trends on the right-hand side of (13.66) if itis to be generalized beyond the linear form. Any misspecification would distortthe results of inference. As for the side lines the co-integrated regression asopposed to spurious regression may be confirmed by the rank determined by theJohansen method in Chapter 15. We had better confirm that the co-integration isdirected from x to y, but I do not know how to do this except in the case wherey is a scalar (see Section 13.4.3). One can move from 3 above to the simpler 2above if \yt} is found to fail to cause {xt} in Granger's sense.

error

Page 217: Hatanaka econometry

14

Inference on DynamicEconometric Models

Inference theories in the previous chapter will be applied to econometric models.An example of dynamic econometric models is the error-correction model thatconsists of (12.29), (12.30), and (12.34) in Chapter 12. Here it is modifiedslightly for the purpose of illustrating the inference method to be given in thepresent chapter. Earlier the target variable was constructed by (12.30), but hereI generalize it to

(12.34) is rewritten here with different notations,

(14.1)

Using C(L)H, = c(l)n and (c(L) - I)e2( — (1 - c(L)~')(Ax, - c(l)ju) it is seenthat E(je,|/,_i )-*,_! = c(l)/i+(l-c(L)-I)(Ajc,-c(l)M) = M+(l-c(Lr1)Ajc,.The new model is

(14.2)

which can also be written as

(14.2')

(14.1) is added to them. It is assumed that 1 > 1 — y > 0 and that {(£1,, £21)} isi.i.d. with zero mean. I shall discuss the condition on E(e\,S2t) as I proceed. Theexpression (14.2) involves c(L) in (14.1) while the expression (14.2') involvesonly the innovation in (14.1), and both expressions have been used in the liter-ature of rational expectations.

Another example of econometric dynamic models is the linear quadratic modelproposed in Kennan (1979). It has been used in many empirical studies ofrational expectations, and analysed on stochastic trend and co-integration inDolado, Galbraith, and Banerjee (1991) and Gregory, Pagan, and Smith (1993).Wickens (1993) also presents a related study. In Kennan (1979) an economicagent minimizes

where y* is the target variable generated by

(14.3)

Page 218: Hatanaka econometry

Inference on Dynamic Econometric Models 205

{y,} is the decision variable, and {(xt, e\,)} is exogenous to the agent. The ftis the discount factor so that 1 > ft > 0, and a(> 0) is the relative weight,w\/W2, between w\ on the deviation from the target, (yt+s - y*+s)2, and w2on the adjustment cost, (yt+s — yt+s-i)2. [sit] is i.i.d. with zero mean. Kennan(1979) derives the behaviour equation as follows. The algebraic equation for z,

has two positive real roots, the smaller of which is less than unity. Let this rootbe A, which of course depends on fi and a. It then follows from the optimizationof the asent that

where d = (1 - pwl(c0pk + ci(;6A.)2 + ...). Substituting this into (14.4) weget the behaviour equation,

which can also be written as

(14.5')

The model is (14.1) and (14.5), or (14.1) and (14.5'). (14.5) contains c(L\ but(14.5') does not.

Both (14.2) and (14.5) can be regarded as special cases of

(14.6)

where k and the as and bs in a(L) and b(L) are some known, possibly non-linearfunctions of an r dimensional vector parameter, 6. Writing a(L) — O.Q + a\L +. . . + apL

p, ao should not be normalized to unity because the normalizationwould mean that ao is not a function of 9. The inference is to be made about0 and not about k or the as and fos. It is assumed that all roots of a(z) = 0 lieoutside of the unit circle so that a(L) is invertible. If {Ax,} is stationary so is{Ay,}, i.e. if {x,} is 7(1) so is {y,}.1 b(L) may be an infinite power series of L,

1 If [x,} is / ( I ) and {aj} is bounded by a decaying exponential, then y, = 'Y^°=Qaixt-i 's '(!)•For example, if x, = ]>^J=1 £s and («,} is i.i.d. with zero mean Ay, = a^e, + aist~i + ..., whichie TfC\\ AX/Vmn thf rt*\'xt\f\r\ 1C ovnrncc*iH Qc h{T\\i — v. \l/1nat-A hfj\ 1C a nr\K/n^miol f\f T h( I

which is equation (2.13) of Kennan (1979). Now suppose that x, is generatedby (14.1) and that {eir} in (14.3) is independent of {e2t} in (14.1). Let

(14.4)

Then

Page 219: Hatanaka econometry

206 Co-Integration Analysis in Econometrics

but {bj} is bounded by a decaying exponential. The data-generating process of{x,} in (14.6) is (14.1), and in general 6 includes the cs in (14.1). The relationbetween {e\t} in (14.6) and {Axr} will be specified as we proceed. But {e\t} and{S2t} are i.i.d.,2 and {Cj} is bounded by a decaying exponential.

Another important class of dynamic relations is the one developed on themodel specification methodology in Davidson et al. (1978), Hendry and Mizon(1978), Hendry and Richard (1982), Hendry, Pagan, and Sargan (1984), andHendry (1987, 1993). It is

(14.7)

Two points worthy of emphasis are that an economic theory specifies y, = ax,as the long-run equilibrium and that [EI,] should be none other than i.i.d. Botha(L) and b(L) are of finite orders, and the orders as well as the as and £>s maybe estimated by the standard time-series analysis with no reference to economictheories. The weak exogeneity of x, with respect to 5, a, the as, and bs is also animportant part of the specification. Writing <f>(z) = 1 — (1 + 8)z — a(z)z(\ — z), itis assumed that the equation 0(z) = 0 has all its roots outside of the unit circleso that if x, is /(I) so is y,. An implication of this assumption is that </>(!) > 0,as shown in Section 6.2.3 of Part I. In turn 0(1) > 0 implies 5 < 0. It can beeasily confirmed that if {xt} is 7(1), y, — ax, is the long-run equilibrium. I shallcall (14.7) the Hendry model. (14.1) may be associated with (14.7). (14.7) doesnot logically follow from (12.18), but I accept (14.7) as an important strategyto model the reality.

In the present Chapter I shall describe inference methods on the generaldynamic model (14.1) and (14.6) and the Hendry model, (14.1) and (14.7).Following Phillips (1991a) and Phillips and Loretan (1991) the methods areclassified into two groups. In the first group the inference on the long-run para-meter precedes that on the other parameters. It is called the two-step method.In the second group both the long-run and the short-run parameters are jointlysubjected to the inference. It will be called the single-step method. My proposalon the general dynamic model (14.6) is based upon (14.15) below, which allowsus to truncate the infinite power series, b*(L), but otherwise it closely followsPhillips and Loretan (1991). In my judgement the two-step method has widerapplicability than the single-step method.

Throughout the present chapter we shall be concerned only with estimationand hypothesis testing on parameters, assuming that given models are validlyspecified. A part of the specification is the co-integration between x and y. Thiscan be tested by the method in Chapter 15 within the framework of VAR, whichis logically more general than the econometric models considered here. Kremers,

must represent a stable difference equation, and Boswijk (1994) presents a method to test for theinstability.

2 If EH is replaced by a stationary AR, e.g., u, such that (1 - pL)u, = s\t, then a(L), k, andb(L) are replaced by a*(L) = a(L)(\ - pL), k* = k(\ - p), and b*(L) = b(L)(\ - pL); 6 is replacedby 8* = (ff, p); and the a*s, k, and fo*s are new functions of 0*.

Page 220: Hatanaka econometry

Inference on Dynamic Econometric Models 207

Ericsson, and Dolado (1992) consider discrimination between co-integration andspurious regression specifically on the Hendry model assuming that a. is known.

14.1 Hendry Model with the Two-Step Method

My explanation of the two-step method will begin with the Hendry model (14.7)in conjunction with the DGP for xt, (14.1). To simplify the following expositionit is assumed that a(L) and b(L) in (14.7) are just a and b respectively so thatthe model is

(14.7')

This can be rewritten as

(14.8)

It is assumed that 1 - (1 + S + a)L + aL2 is invertible. With

(14.9)

it is seen that

This kind of reparametrization is found useful throughout the present chapter.It follows from (14.8) that

(14.11)

which implies that y, — ax, is stationary.The first step of our two-step method is based on (14.11). Assume that an infi-

nite power series of L, (l-(l+8+a)L+aL2)~l(bQ+b*L) can be truncated at, say,p. Let us run an OLS with y, as the dependent variable and (xt, Ax,,..., AJC,_P)as the independent variable. Though (14.11) indicates constraints on the coef-ficients of independent variables, such constraints are ignored in the presentOLS. Let the estimate of a be denoted by or. Suppose that the true value of VL in(14.1), yu°, is zero. Then, if (1 — (1 +& + a)L + aL2)~le\, does not cause A.xt inGranger's sense, the reasoning given in Section 13.3.2 shows that T(a - a) isasymptotically mixed Gaussian. Suppose next that fj,° / 0. Then whatever therelation between e\t and A*, may be, T3/2(£t — a) is asymptotically Gaussian.How to perform a statistical inference about a will be explained later.

In the second step of our two-step method we rewrite (14.7') as

(14.12)

If /z0 = 0, a-ah OP(T^) andx, is Op(Tl/2) so that (a-a)jc,_i is OP(T~^2).

If //> 7^ 0, a - a is Op(r~3/2) and x, is Op(T) so that (a - a)xt_l is again

Page 221: Hatanaka econometry

208 Co-Integration Analysis in Econometrics

Op(T~1/2). In either case the last term on the right-hand side of (14.12) may beignored. We get

(14.13)

If {£)(} in (14.7') and {£2,} in (14.1) are mutually independent, Ax, is weaklyexogenous in (14.13) with respect to (S, a, b). Noting that (14.13) is an equationon 7(0) variables, and also that s\t is uncorrelated asymptotically with othervariables in (14.13), we run an OLS with Ay, as the dependent variable and((yt-i - &x,-i), A ;y,_i, AJC,) as the independent variable to estimate (6, a, b).

The feature of the two-step method is to take advantage of T- or T3/2-consist-ency of the estimator of a, which can be achieved without bothering about thedetails of short-run relations. In fact the short-run relations are taken into consid-eration in the first step only in introducing (A*,,. . . , AJC,^P) in the regressor set,while ignoring the constraints on the coefficients of these regressors. Moreover,the relation between {ei,} and {e2,} required in the first step is weaker than inthe second step.

The inference on a is made in the first step. Since (l-(l + S+a)L+aL2)~lsi,is not i.i.d. in (14.11), the ^-statistic is not asymptotically N(0, 1) as explainedin Section 13.3.1. If a Cochrane-Orcutt transformation is performed on all therelevant variables, yt, x,, AJC,, . . . , AJC,_P, then the ^-statistic on the coefficientof the transformed x is asymptotically N(0, 1).

The inference on (S, a, b) is made in the second step, treating a as though itis the true value of a. This part of our inference is a standard one. \/T(& — S,a - a, b - b) is asymptotically N(0, V), where V is defined by

Frequently we encounter Hendry-models having multiple equations. It is easyto see that the above inference procedure for a single equation can be extendedto multiple equations. The second step involves an SUR (seemingly unrelatedregression) estimation. See Highlights of Chapter 13 for multiple equations.

The following will be found useful later in relation to the single-step method.Suppose that our inference on a is based upon the OLS on (14.11) withoutCochrane-Orcutt transformation with y, as the dependent variable and (xt,AJC, , . . . , Ajtr_p) as the independent variable. With t/>(L) = (1 — (1+5 + a)L +aL2)~l, it is seen that the long-run variance of <j>(L)e\, is 8~2au. T(a — a)is asymptotically N(0,S~2ang) conditionally upon g = (f w(r)2dr)~l, wherew(r) is a scalar Wiener process with its variance equal to c(l)2022 i.e. the long-run variance of AJC,. Therefore it is ST(a — a) instead of T(a — a) that isasymptotically mixed Gaussian with N(Q, <J\\g).

Page 222: Hatanaka econometry

Inference on Dynamic Econometric Models 209

14.2 Dynamic Equation with the Two-Step Method

The model to be considered here is (14.1) and (14.6). Keep in mind that k, theas, and the bs are functions of 9. Define

Then a is a function of 0, and yt = ax, is the long-run equilibrium. Construct

(14.15)

It is seen that

(14.14)

so that b* = a Eftl;+i ah - Y^Lj+i bh,where ap+\ = ap+2 = ... = 0. Since [bj] is bounded by a decaying exponen-tial, so is {b*}? Readers should recognize that a similar reparametrization wasperformed in (14.9) and (14.10). Using (14.15), (14.6) is rewritten as

(14.16)

The first step of the two-step method is concerned with estimationof a in (14.16). For this we regard y, as the dependent variable and(x,, 1, Ax,..., Ajt,__OT) as the independent variable, assuming that a(L)~lb*(L)can be truncated at m. The estimation consists of two stages. The (a\, ...,ap)in a(L) is estimated from residuals in the first OLS to get a(L). The secondstage has a(L)y, as the dependent variable, and a(L)(x,, 1, Ax,,..., Ax,-m) asthe independent variable. The ^-statistic on a(L)x, is asymptotically N(0, 1).The asymptotic inference procedure is invariant to /n° = or / 0 in (14.1). Itis sufficient for the reasoning to assume that a(L)~ls\t does not cause AJC, inGranger's sense.

As for the second step of our two-step method consider

(14.17)

We may ignore the last term on the right-hand side of (14.17) to obtain

(14.18)

This is a relation among 7(0) variables. If a is not an element of 9 but afunction of 9, say, a = h(6i, ...,6r), it can be solved for Or to get 6r =f(0\, ..., 9r^i, a). The new 9 is (&i, ..., 9r_\, a), and the dependence of k, theas, and the bs upon the old 9 is re-expressed as the dependence upon the new

3 Compare this with n. 5 of Chapter 12.

Page 223: Hatanaka econometry

does provide a consistent estimator of 0*. We may choose between the MLE andthe non-linear least squares on (14.19). The latter is asymptotically less efficientbut easier to compute.4 (Although a(L) was estimated previously in connectionwith (14.16), it is &*, not a(L), that we like to estimate.)

If {Ax,} is not weakly exogenous in (14.18) with respect to 9*, its efficientestimation requires joint estimation of (0*, c\,..., cq, /z, 1722)- If one is willingto sacrifice the efficiency for convenience in calculations, one may estimate( c i , . . . , cq, fi, <722) in (14.1), and substitute them in (14.18). It will be seen that{Ax,} is not generally weakly exogenous in the models that involve rationalexpectations.

The model (14.1) and (14.2) or (14.2') may be used for illustration. The athere corresponds to a in (14.14), and it is an element of parameters to beestimated. The long-run relation is y, = axt. Assuming that E(e\te2t) — 0,and that (14.2) instead of (14.2') is used, Table 14.1 shows the correspondencebetween notations in the general model (14.1) and (14.6) and notations in themodel (14.1) and (14.2). What is denoted by b*(L) in the general model isft — a — pc(L)~l in the illustrated model. The estimation of a in the first steprequires no explanation. The model to be considered at the second step is

(14.20)

The UQ is unity from the outset. However, {Ax,} is not weakly exogenousin (14.20). This is because b*(L) — ft — a - pc(L)~l contains the para-meters (cj , . . . , cq) in c(L), which has been brought about by the behaviour

4 An extremely general treatment of long-run and short-run parameters is found in appendix Cof Johansen (1991a).

5 Hiro Toda has kindly pointed out an error in the original draft.

(14.19)

210 Co-Integration Analysis in Econometrics

9. Let 9* = ( 6 * 1 , . . . , 9r-\). The entire parameters are classified into the long-runparameter a = 9r and the short-run parameter 9*.

We are concerned with the inference on 9* on the basis of (14.18). The as,b*s, and k may depend upon a = 9r, in which case a. is replaced by a in theseparameters as well. 9* may be partitioned as (0*, c\,..., cq, IJL, 022), where c(L)in (14.1) is written 1 + c\L + . . . + cqL

q and a22 = E(s\t). If {Ax,} is weaklyexogenous in (14.18) with respect to 6*, then the estimation of 9* may bebased upon (14.18) alone. As for the MLE conditioned on {Ax,} one shouldrecognize that the Gaussian likelihood contains the Jacobian determinant on thetransformation from (en , . . . , e\j) to (y\, ..., yj) because the determinant is notunity unless GO = 1 • Pursuing the non-linear least-squares approach, minimizing

does not even lead to a consistent estimate unless 8ao/d9*} = 0, but minimizing

Page 224: Hatanaka econometry

equation (12.29) containing E(xt\It~\) — x,~.\. We might estimate (c\,..., cq) in(14.1), and substitute the estimates in (14.20) prior to the estimation of (y, ft, k).

The above has considered (14.2). We might instead consider (14.2')- It followsfrom (14.2') that

y, - ax, &yk + (\- Y)(yt-\ - ax,-i) + (ft - a) Ax, - fts2l + e\t. (14.20')

Note that Ax, and e2r are correlated. However, since Ax,^i is uncorrelated with£2r (but correlated with Ax, unless c(L) = CQ), (I, (yt-i — axt-\), Ax,-\) isa valid instrument to estimate (yk, (1 - y), (ft — a)) by instrumental variableestimation. A consistent estimate of (y, ft, k) can be obtained from the estimateof (}*,(!-y), (/?-«)).

In the model (14.5) the long-run relation is y, = ax,, and a is again the long-run parameter. The second step here is more complicated than in the previousexample. First of all ao ^ 1. It is easier to proceed along (14.5') rather than(14.5), but it still leads us to a complicated equation

where I used d + d(\) = (1 — ft)C)~lc(\)ftX. The weak exogeneity of Ax, doesnot hold, and Ax, is correlated with £2,. We might estimate (c\,..., cq, n) from(14.1), and substitute them in /ac(l). Estimation of (X, ft\) is feasible with aninstrumental variable method if we are willing to ignore the dependence of thedisturbance variance upon (A., /6A.).

14.3 Hendry Model with the Single-Step Method

In Sections 14.1-2 the inference on the long-run parameter has preceded thaton the short-run parameters. It is also possible to perform inference on both thelong-run and the short-run parameters at once.

Phillips and Loretan (1991) discuss a non-linear least-squares estimation ofthe Hendry model. I think that the non-linear least-squares method is useful forthe estimation of the Hendry model. The model specification has been explained

Inference on Dynamic Econometric Models 211

TABLE 14.1

a(L)b(L)c(L)akM00T

(14.1), (14.6) (14.1), (14.2)

Page 225: Hatanaka econometry

212 Co-Integration Analysis in Econometrics

following (14.7). I assume again that a(L) and ft(L) are just a and b respectivelyso that the model is (14.7'), which is reproduced below.

(14.7')

Note that y, = out, is the long-run equilibrium. The long-run parameter is a, andthe short-run parameters comprise all the rest.

Initially I assume that {x,} is 7(1) without a deterministic trend. The minimumof

Turning to the derivatives with respect to S, a, and b it is seen that they shouldbe normalized by 7"~1/2 rather than T~~i as in (14.21a) and (14.21ft) on thederivative with respect to a.

Denote ?'-i = (£2 , . . . , £ r - i ) , AyL , = (Ay2 , . . - , Ayr_ i ) , Ax ' = (Ax3 , . . . ,A*r), e' = ( £ 1 3 , . . . , eir), and xij = (x 2 , . . . , *r-i )• Under the present assump-tion about {jcr}, i.e. absence of a deterministic trend, T~3/2 5Z§<-i Ax,,

(14.2W)

where !-, = y, — a°xt. Rewriting the left-hand side we get

(14.21c)

(14.21ft)

Therefore the linear approximation to (14.21ft) minus (14.21a) yields

At the true values of the parameters

(14.21a)

is sought to estimate (5, a, ft, a). See Phillips and Loretan (1991) for their ex-periences with a number of algorithms for minimization of this particular typeof functions. Let (S, a, ft, a) be the (S, a, ft, a) that minimizes S. For the theoret-ical analysis toward the asymptotic distribution derivatives of S with respect to(S, a, ft, a) are set to zero, and the non-linear expressions that result are linearlyapproximated about (5°, a°, ft0, a°). This is a standard procedure in the non-linearregression. I show the approximation only on the differentiation with respect to

a. From

Page 226: Hatanaka econometry

(14.22)

Then (S, a, b, a) statisfies (14.22) asymptotically.6 Notice that the system ofequations (14.22) is decomposed into two parts. One is just

6 Readers might wonder about the existence of solution and its consistency. The function to beminimized is S(0) with 0' = (S, a, b, a). The Taylor expansion leads us to

+ terms involving the third-, fourth-, . . . order derivatives.

From our DGP it follows that

where the matrix in [ ] is positive definite in probability 1. (Earlier it was noted that 5° < 0.) Forthe terms involving third-order derivatives e.g.

The conclusion is that when S(0) - S(0°) is looked upon as a function of DT(0 - 0°), it is convexdownward if T is sufficiently large. A unique minimum exists, in probability 1, i.e. the non-linearleast-squares estimator exists, if T is sufficiently large. Plausibility of -JT consistency of (S, a, b)and ^-consistency of a has also been demonstrated because the argument is Dr(0 — 0°) rather than(0 - 0°).

Inference on Dynamic Econometric Models 213

• r5//2(a — or°)2(<5 — 5°) that enters into (*) above, we note thatsince it is T 5/2 335

da2X

are all We obtain

Page 227: Hatanaka econometry

214 Co-Integration Analysis in Econometrics

which is analogous to the equations for the OLS estimator in the co-integratedregression in Chapter 13. The other is

which is mixed Gaussian. Readers should recognize that this is identical tothe limit distribution on a in the first step of the two-step method that wewould have if the Cochrane-Orcutt transformation were not performed there(see the last paragraph of Section 14.1 above). On the other hand the process,{(£/-i, Aj>;-i, Ax,)} is stationary, and the limiting distribution of •*/T((8 —8°), (a — a°), (b — b0)) is standard, and represented as N(0,cri\A~l), whereA = plim T~l(l--i, Aj>_i, Ax )'((f_i, Ay_i, Ax). It is interesting to observe thatthe present limit distribution on (8, a, b) is identical to that in the two-stepmethod given earlier in Section 14.1. Also note that the inference on (8, a, b)and that on a are separated. See Phillips (1990) for the reasoning to establishthe asymptotic independence between the estimators of long-run and short-runparameters.7

Since

where s is the standard error of regression.Let us turn to the case where {x,} is 7(1) with a linear deterministic trend so

that AJC( = /z + (a stationary process with zero mean). The long-run equilibriumis still y, — ax,. The partial derivative of S with respect to a was previouslynormalized by T~l in (14.21a)-(14.21<i), but now the normalizer should be

7 Koichi Maekawa has kindly pointed this out to me.

which is mixed Gaussian. Therefore on a sort of f-statistic

(14.24)

(14.23)

which is a standard result in the traditional econometrics dealing with 7(0)variables.

Let£(e?/) = o-i1,£(ei,e2r) = 0,and£(4) = CT22in(14.r).Let(H'i(r))wbe a Wiener process with covariance matrix, diag[ern, c(l)2022L Then it is seenfrom (14.22) that

Page 228: Hatanaka econometry

and T 3XX2-i ~^ 1/3/x02. The conclusion is that our asymptotic inferenceprocedure is invariant to /x° — and ^= 0.

14.4 Dynamic Equation with the Single-Step Method

Let us consider again the general dynamic model, (14.1) and (14.6), keepingin mind that k, the as, and the bs are functions of the r-dimensional vectorparameter, 6=(Q\,...,9

(14.15) enables us to rewrite (14.6) as

(14.25)

where a(L) = QQ + a\L + ... + apLp, a(z) = 0 has all roots outside of the

unit circle, b*(L) may be an infinite power series, but {b*} is bounded by adecaying exponential. Moreover, I assume here that {EI,} and {£2;} in (14.1) areindependent.

In general the as, bs, and k depend upon the parameters for (14.1), i.e.c\,..., cq, /LI, a22, as indicated in Table 14.2 for the special case, (14.1) and(14.5). In practice we had better estimate them on the basis of (14.1) alone inorder to save computation costs. The estimates are then substituted in the as,bs, and k. In the following description 0 does not include c\,..., cq, /^, 022-

Let us explore the least-squares method on (14.25). Writing f ( L ) =«o'fl(L) = 1 +f\L + . .. + fpLp, d = a.Qlk, and g(L) = aQlb*(L), and truncatingb*(L) and hence g(L) at a finite order,

Inference on Dynamic Econometric Models 215

r~3/2. (14.2W) is replaced by

Since

A sort of ^-statistic, (14.24), converges to N(0, 1) because

is minimized with respect to 0. The 9 affects 5 through a, d, the /s and gs, whichmay be assembled into a vector 6. The case where 9 — 0 has been investigatedin Phillips and Loretan (1991). In the present model 6 has a larger dimen-sionality than 9, and the linear approximation to dS/d9 is developed through

(14.26)

and are all

Page 229: Hatanaka econometry

216 Co-Integration Analysis in Econometrics

(d9/d9f(9°))(9 - 9°) and (920}/'d6d&'(e°))(e - 6°). It is reasonable to assumethat for each i = 1 , . . . , r, 90/9$, is not a zero vector at the true value of 9.

I now suppose that a = 6r because as indicated in Section 14.2 9 can bereparametrized so that this holds. Let us assume that there are no functionalrelationships among elements of 9, because, if there are, the dimensionality of9 can be reduced. Let us also assume that the true value of 0 is not on theboundary of the admissible domain of 9. Then each element of 9 is free to movein a small neighbourhood of the true value. It is such neighbourhood that weare concerned with in developing the asymptotic theory. Since a is an elementof 0 as well as of 9, da/991 is a portion of d9/B9r. And we have

(14.27)

However, partial derivatives of elements of 9 other than a may not be zero nomatter whether they are with respect to 9r or 0,-(z / r).

The sort of block-diagonality as observed in (14.22) is revealed without(14.27), but (14.27) certainly simplifies the asymptotic distribution of the non-linear least-squares estimator. The asymptotic theory requires that the truncationpoint of g(L), m, should be increased with the sample size r at a speed slowerthan T, but I do not bother to consider such mathematical sophistication.

Let 9 = (0i, . . . , 0 r ) be the 9 tha t min imizes S in (14 .26) . Cons t ruc t fo ri=l,...,r-l

TABLE 14.2

(14.1), (14.6) (14.1), (14.5)

a(L)

f ( L )k

d

b(L)

b*(L)

g(L)

a

Page 230: Hatanaka econometry

Inference on Dynamic Econometric Models 217

(14.28)

where (w\(r), W2(r)) is a Wiener process with covariance matrix, diagfcrn,c(l)2<T22]- For a sort of f-statistic it holds that

(14.31)

If fj° ^ 0 in (14.1), T3/2(9r - 9°r) -> N(0, I>^~2a(\r2on), and yet (14.31)holds true.

It is important to compare usefulness of the single-step and the two-stepmethods. First, concerning the long-run parameter, a, it is precisely the f-statistic(on the Cochrane-Orcutt transformed variable) that converges to N(0, 1) inthe two-step method, whereas it is the more complicated form, the left-handside of (14.31) in the single-step method. The finite sample distributions ofboth statistics need to be investigated. Second, concerning the estimation ofshort-run parameters, the 9s other than a, the single-step estimation and theminimization of (14.19) in the two-step method yield the identical limiting distri-bution. However, the finite sample distributions may well be different. Third,

where it is assumed that p < m. The process {(£1, . . . , £r-i)} is stationary, andfor (&i,..., 9r-i) we have

(14.29)

In deriving this 3/0/3#/ = 0 plays an important role. The term a^1 comes fromthe last term in (14.25). Let

Then from (14.29) it follows that JT(0\ - 0? , . . . , 0 r_j - 0°_i) converges toN(Q,ana^A^).

Concerning 9r, if /M° = 0 in (14.1), we obtain

(14.30)

Page 231: Hatanaka econometry

218 Co-Integration Analysis in Econometrics

the requirement on the exogeneity is less severe in the two-step method thanin the single-step method in so far as the estimation of the long-run parameteris concerned. When the OLS is run along (14.11) or (14.16) the coefficients ofAxt, Ax,-i,... are unconstrained, and are determined automatically to eliminatethe relation between x, and e\,. The Granger non-causality condition is sufficientto ensure the mixed Gaussian property of a. Moreover, if the condition is indoubt, one could introduce the forwarded A*, as suggested in Section 13.3.2.In the single-step method the coefficients of Ax,, Axt~i,... are constrained asfunctions of the common parameter, 6. This is the reason why I assumed thestrict exogeneity of x, in the single-step method.

In my judgement the third point concerning the exogeneity would outweighother points unless there is revealed an enormous difference in finite sampledistributions between the two estimators.8

8 Throughout the present chapter the disturbance terms are assumed to be i.i.d. Sugihara (1994)investigates the case where they are ARMA.

Page 232: Hatanaka econometry

15

Maximum-Likelihood InferenceTheory of Co-integrated VAR

Comprehensive inference procedures for the co-integration have been developedby Johansen (1988, 1991a, 19926, 1992c, 1994) on the basis of the maximum-likelihood analysis of the VAR error-correction representation. The proceduresinclude among others (i) data-based selection of the co-integration rank, and(ii) testing restrictions on the co-integration space with a given rank. Limitingdistributions used for the selection of rank are completely free from all thenuisance parameters. Moreover, the test statistics on the co-integration space aredistributed asymptotically in x2 like those in Chapter 13. Deterministic trendsare properly taken into consideration, and the method can be easily adapted tostructural changes in the deterministic trends.

Johansen (1988, 1991a) has been motivated in part by the literature in the fieldcalled the reduced-rank regression, which is closely related to the econometricmodel of simultaneous equations. Identifiability of coefficients in a structuralequation is determined by the rank of a certain submatrix of the reduced-formparameters. In the reduced-form regression of p dependent variables, y, upon qindependent variables, AC, the independent variable is partitioned as x' — (x'}, x2)where X2 consists of qi variables, The z'th observation of y is generated by

(15.1)

where the matrix B2 is p x q2. Anderson (1951) considered how to test H0 :p(B2) = r against HI : p(B2) > r. The results are as follows: (i) The likelihoodratio is expressed by roots of a certain determinental equation, which rootsare later found identical to the squares of the canonical correlation coefficientsbetween x2 and y after eliminating linear effects of x\\ (ii) (—2)x (the log-likelihood ratio) is asymptotically distributed under H0 in the x2 distribution with(p — r)(q2 - r) degrees of freedom, which is consistent with the general theoryof the maximum-likelihood principle, because p(B2) = r implies (p — r)(q2 - constraints upon the elements of B2.

1 The likelihood ratio test is found usefulto determine p(B2).

Given an m x n matrix A, partition it as

where AH is r x r and non-singular. Then p(A) = r implies A 22 = A2iA111A|2, which is (m—r)(n—r)

constraints on the elements of A.

Page 233: Hatanaka econometry

220 Co-Integration Analysis in Econometrics

The model such as (15.1) is called the reduced-rank regression. Its character-istic is that when the coefficient matrix is p x q, the pq elements of the matrixare not free to move in the pq dimensional Euclidean space, but bound by someequality constraints. Anderson and Kunitomo (1992, 1994) extend Anderson(1951) to incorporate not only the likelihood ratio test but also the Wald and theLagrange multiplier tests, and Kunitomo (1994) applies the results to the co-integration analysis in a general non-stationary process. However, I shall followJohansen (199la) in the following presentation.

The equation (15.1) resembles the VAR error-correction representation,(12.24). The variables, y, x\, and *2 in (15.1) correspond respectively toAxr, (A*,-!,..., Aje,_9+i), and xt-\ in (12.24). The canonical correlationin (15.1) between x2 and y after eliminating effects of xi corresponds tothe canonical correlation in (12.24) between Axr and x,-\ after eliminatingeffects of ( A x t _ i , . . . , A.xt-q+i). Needless to say that the distributions of thestatistics are different between the two models because of the non-stationarityin (12.24).

Prior to the development of co-integration analysis the VAR models had beenestimated either by the OLS on the levels of x,,

or by OLS on the difference, Ax,,

The former OLS is consistent under all conceivable conditions,2 but not efficientif {xt} has integration order 7(1) and co-integrated. The inefficiency is due tothe reduced-rank condition being ignored, where the condition is that FI(1) =/ — HI — ... — n? has a rank less than the number of elements of x, when {x,}is co-integrated. The latter OLS on Ax, is consistent only when {x,} is 7(1) andnot co-integrated. If {x,} is co-integrated, the VAR in Axr is an invalid modelas shown in Section 12.2.2.

Banerjee et al. (1993) present the Johansen method in detail. I shall omit theexplanation of its mathematical aspect. Instead I shall give more consideration tothose points that require clarification in light of the criticisms raised by Phillips(199Id) against its implementation.

The plan of the present chapter is as follows. Section 15.1 considers the deter-mination of co-integration rank from the data. The derivation of the test statisticand the sequence of hypothesis testing for determination of rank are describedin fair details following Johansen (1988, 1991 a, b, \992b) and incorporatinga criticism made in Phillips (199Id). Results of simulation studies in Toda(1994, 1995) and Morimune and Mantani (1995) on the determination of rankand lag order are summarized. Structural changes are introduced in determin-istic trends as they are found important in Part I. Section 15.2 is concerned

2 Its limiting distribution is derived in Ahn and Reinsel (1988, 1990).

Page 234: Hatanaka econometry

Maximum-Likelihood Inference Theory 221

with the identification problem on B and the hypothesis testing on the co-integration space, following Johansen (1991a) and Johansen and Juselius (1990,1992) and acknowledging a comment in Phillips ( I 9 9 l d ) . I shall do my bestto explain differences between the co-integrated regression in Chapter 13 andthe Johansen method in this chapter. Section 15.3 criticizes, following Phillips(1991oO, a misleading practice of listing the estimates of co-integrating vectors.Section 15.4 deals with a system conditioned on exogenous variables. FollowingJohansen (1992c, I992d) and Johansen and Juselius (1992) a condition for weakexogeneity is presented. Consideration of the weak exogeneity contributes againto clarifying the relationships between the VAR in this chapter and the co-integrated regression in Chapter 13. Granger non-causality in the long run iscalled co-integrating exogeneity in Hunter (1992). I shall point out a simplebut useful case where the weak exogeneity, the non-causality in the long run,and the validity of co-integrated regressions are all united. I shall also introducethe results in Banerjee et al. (1993: 288-91) on the weak exogeneity and theresults in Toda and Phillips (1993) on the Granger causality test. Section 15.5surveys empirical applications of the maximum-likelihood analysis of VAR. Ithas influenced my entire writing of Part II.3

The notations such as det[ ] and p( ) are continued from Chapters 11 and 12.The Johansen method uses the flexible definition of co-integrations not requiringthat each variable is 7(1) individually. As indicated in Section 12.1 and 12.3.7,when an 7(0) variable is included, a unit vector can be a co-integrating vector.How to distinguish such co-integrations from long-run relationships was indi-cated in Section 12.3.7.

15.1 Determination of Co-integration Rank

75.7.7 Introduction

Let us consider the model (12.26), which is reproduced here as

(15.2)

Johansen (1991a) has xt-q instead of xt-\ on the right-hand side of (15.2), butthe essence of the following reasoning does not depend on whether x r_i or xt~q

is placed. Let E(ete't) = ft. Unknown parameters are A, B, T\,..., F9_i, /A, ft,and r. It is assumed that the lag order, q, is known. The presence of JA on theright-hand side generates a deterministic linear trend in x, unless A'±\i = 0.

3 Phillips (1993) applies the idea of fully modified least squares, given in Appendix 6, to theestimation of a possibly co-integrated VAR, and obtains results that are remarkable in terms ofthe asymptotic theory. It is not included in the present survey because it depends heavily upon theestimation of long-run covariance matrix.

Page 235: Hatanaka econometry

222 Co-Integration Analysis in Econometrics

The first problem is how to select r, the co-integration rank, in light of theavailable data. We are concerned with the hypotheses,

(15.3)

It is important to bear in mind that the hypotheses here are inequalities <rather than equalities. (Different hypotheses will be introduced later.) Altogether(k + 1) hypotheses, H(fc), H(fc — 1 ) , . . . , H(0) are available, and they form anested sequence. What each hypothesis means can be considered on the basisof Section 12.2.2. H(£) places no restrictions upon the parameters, {x,} maybe stationary or non-stationary, and if it is non-stationary it may or may notbe co-integrated. H(& — 1) is equivalent to the singularity of n(l). Recallingthat the non-singularity of n(l) is the stationarity of xt, we see that H(k — 1)specifies {x,} to be non-stationary, but it may or may not be co-integrated.Therefore testing for H (k - 1) against H(fc) is a test for non-stationarity withthe non-stationarity as the null. H(& - 2) specifies {x,} to be non-stationary andmoreover not to have (k — 1) linearly independent co-integrating vectors. Thustesting for H(& — 2) against H(k — 1) is a test for at most (k — 2) co-integratingvectors. Finally H(0) means absence of co-integration.

15.1.2 Derivation of likelihood ratio test statistics

The likelihood ratio test may be applied to the above sequence of hypotheses.For the determination of r all the parameters but r must be concentratedout of the log-likelihood function. It is seen that, given B, (15.2) is aspecial case of the seemingly unrelated regression (SUR) model withB'xt-i, Ax ,_ i , . . . , Ax,-q+i, 1 as regressors. It is special in having the identicalset of right-hand side variables in each equation, and it is well known for such aspecial SUR model that the maximum-likelihood estimators of all parameters areidentical to the OLS estimators, in which the disturbance covariance structureis not exploited for the estimation. All the parameters other than r and B canbe concentrated out by replacing them by their OLS estimators. But (15.2) is areduced-rank SUR, and B will require special consideration.

Let z0r = Ax,, zu = x,-i, z2l = (Ax/^ , . . . , A*,'_?+1, 1)', and T =( F i , . . . , ] % _ , , ft). Then (15.2) is

(15.4)

(15.4) is a reduced-rank regression model. Assume that {er} is i.i.d. and et isN(0, fi). The log-likelihood generated by (15.4) is asymptotically

ignoring the effect of initials.

(15.5a)

Page 236: Hatanaka econometry

Let us consider first the case, r = k. Writing II = AB' and concentrating it outof (15.5c) through the OLS of n, ft = SQlSn

l, we have

(I5.5d)

with

(15.8)

Secondly, as for the other extreme case, r — 0 i.e. A — B = 0, (15.5c) isreduced to

(I5.5e)

Let us then consider an intermediate case, 0 < r < k. To maximize (15.5c) wemust minimize det [&(A,B, r)] in (15.5c). Given B, det[£!(A, B, r)] is mini-mized with respect to A by setting A to its OLS, A(B) = SoiBCB'SnB)"1.Substituting it into (15.5c) we are led to minimizing

(15.9)

Maximum-Likelihood Inference Theory 223

The concentration will proceed in the order of F, Q^1, A, and B. To concen-trate F out let its OLS estimator be

(15.6)

Substituting (15.6) into (z0i — AB'zit — Fz2r) in (15.5a) we obtain residuals inthe OLS, but the residuals are also obtained as follows. Regress z,r, i = 0, 1upon Z2r to get

and form £bi — AB'£\,. The concentration with respect to F yields the log-likelihood

The maximum-likelihood estimator of Si based on (15.5b) is

(15.5*)

(15.7)

Substituting (15.7) in (15.5*), we obtain the further concentrated log-likelihood,

(15.5c)

because the second term on the right-hand side of (15.5b) does not containunknown parameters when fl is substituted into fl.

Define

Page 237: Hatanaka econometry

224 Co-Integration Analysis in Econometrics

with respect to B. This determinant is invariant when B is replaced by BQ withany non-singular r x r matrix Q, which reflects the unidentifiability of B. LetXi > ... > X.k be the decreasingly ordered roots of

(15.10)

The A.S are real and non-negative as they are eigenvalues of S^ SioS^1

SoiSn • They are less than unity because Sn — SioSgjj'Soi is positive deIn fact the A.S are squares of canonical correlation coefficients between £o andf i. Let v,-be a k x 1 vector associated with A.,- such that

(15.11)

The minimized value of (15.9) is detfSooin^^l - X,),4 and that one way toattain the minimum is to set

(15.12)

For 0 < r < k the concentrated log-likelihood is

(15.5/)

We also have

(15.13)

Note that only the largest r roots of (15.10) and the associated vectors enterinto (15.12) and (15.5/) though (15.10) has altogether k roots. The above pro-cedure is identical to calculations in the reduced-rank regression, and parallelto derivation of the limited information maximum-likelihood estimator in thesimultaneous equations model as pointed out in Banerjee et al. (1993: 264-5).

4 (15.9) is equal to det[S0o]det[B'SnB - B'S wS^SmB\/Asi[B'S \\B] because

We must minimize

with respect to B. Let A = diag[Ai, . . . , A^], where the As are defined in relation to (15.10), andlet V be the matrix that consists of v\, ...,vt denned in (15.11). The matrix of orthonormalizedeigenvectors of Sn SioS^'Soi^u is Su V, and the diagonal matrix of eigenvathese it follows that V'S n V = h and V'S loS^'Soi V = A. Transform B to F through B = VF,and then transform F to y through F(F'F)-I/2 = Y. Then (*) above is det[F'(/t - A)F], whichis to be minimized subject to Y'Y = Ir. The minimum is achieved by Y = [Ir,0]' because1 > A,i > . . . > Afc > 0. F is not unique, but we may take it as Y. Then B = V [Ir, 0]'.

The minimized value of (*) is n^=1(l - A,), and the minimized value of det[Soo - SoiB

(B'Sufir'B'SiolisdettSooirir (1-X,-) .

Page 238: Hatanaka econometry

Maximum-Likelihood Inference Theory 225

We now find it more convenient to express (15.5d) also by the As. In factthere is no obstacle to extending the above reasoning to r = k, and

15.1.3 Determination of co-integration rank

I now introduce

Johansen (1991a) shows that under H(r) A . 1 ; . . . , A.r, i.e. the r largestroots of (15.10) converge in probability to positive numbers (in fact,eigenvalues of a certain non-stochastic positive definite matrix), and A . r + i , . . . , A.J.converge to zero in probability. Moreover, (Tkr+},..., TXk) do convergein distribution. To derive the limiting distributions under H(r) let wfa)' —( w j f a ) , . . . , Wk-r(s)), 0 < s < 1, be a vector standard Wiener process, andlet/(1)(.s) be the first (k - r - 1) elements of w(s) - J0' w(s)ds, and/fa)' =(/(1)fa)', s - 1/2). Construct a (k - r) x (k - r) random matrix

(15.5d')

The likelihood ratio test statistic for testing H(r) against H(r + 1), r —0, 1 , . . . , fc - 1, is derived from (15.5e), (15.5/), and (15.5') as

Also the likelihood ratio statistic for testing H(r) against H(k), r = 0, 1 , . . . ,k - 1, is

(15.15)

Let Aj_ be a k x (k — r) matrix such that A'^A = 0. If H(r) holds true and

if A'^fi ^ 0, T(^i=l ^-r+i j is distributed asymptotically in the same distri-

bution as that of the trace of (15.16), and T%.r+i is distributed asymptoticallyin the same distribution as that of the maximum eigenvalue of (15.16). Thisis proved in appendices A and B of Johansen (1991a). Notice that (15.14)w Tir+i and (15.15) ^ T^\~rir+i. Johansen and Juselius (1990) call (15.14)and (15.15) respectively the maximum eigenvalue and the trace test statistics,and tabulate the null asymptotic distributions. Osterwald-Lenum (1992) recal-culates and extends the tables in Johansen and Juselius (1990). The conditionA'^fji, ^ 0 means presence of a linear trend in {x,} as shown in Section 12.2.5.The distributions of the likelihood ratio test statistics for the case A'±(i = 0 arealso given in Johansen (199la). Of course it is different from that derived from(15.16). Reinsel and Ann (1992) present an interesting aspect of the likelihoodratio, (15.15).

(15.16)

(15.5d')

Page 239: Hatanaka econometry

226 Co-Integration Analysis in Econometrics

Even though the test statistics were derived from the assumption that et isdistributed in a normal distribution, the limiting distribution of the test statisticdoes not depend on this assumption as is often observed in many econometricmodels.

While H(0), H(l), . . . , H(£) are nested (in fact the tree diagram has onlyone trunk and no branches), H(0), H ( l ) , . . . , H(&) are not nested since H(r) =H(r) — H(r — 1). In the present problem Johansen (I992b) finds that the simple-to-general approach in H(0), H(l), . . . , H(k) does lead to the correct selectionof r with probability (1 - a) if a significance level a is chosen. The reason isas follows.

Suppose that r° is the true value of co-integration rank. Let us use themaximum eigenvalue test statistic, (15.14), in the simple-to-general approach.The decision rule is (i) to start with r = 0, and (ii) to raise r whenever the teststatistic Tkr+i for testing H(r) against H(r + 1) is larger than the critical valuedetermined by the distribution of the maximum eigenvalue of (15.16) with thesignificance level a. Throughout the trials the roots, X i , . . . , Xt are the sameroots of (15.10). But the numbers of rows and columns of (15.16) vary withr, and so does the critical value. If a tried value of r is smaller than r°, Xr+\

f.

converges in probability to a positive number, and Tkr+i diverges to +00 asT ~> oo. Therefore we must reach r° with probability 1 (if T is sufficientlylarge). If the tried value of r is equal to r°, TXr+\ falls short of the critical valuewith probability 1 — a. We stop at r° with probability 1 — a, and go beyond r°with probability a. (If the tried value of r is larger than r°, the limiting distribu-tion of TA.r+i is not that of the maximum eigenvalue of (15.16).) In the abovesequence the maximum eigenvalue test may be replaced by the trace test.

The condition, A'j_/ti / 0, i.e. the existence of a linear trend in {x,}, may be pre-tested for each element of {*,} by the univariate method given in Chapter 8 onthe testing for M {] against M Q. If at least one element has a trend, it means A'^/i /0. Johansen (1991a) shows a multivariate method to test the null hypothesis,p(A) = p(B) = r and A'^/x = 0 against p(A) = p ( B ) = r, using a statisticdistributed in x2 distribution with (k - r) degrees of freedom under the nullhypothesis. Moreover, Johansen (1992ft) shows how to select r in conjunctionwith testing whether A'^ju, = or / 0 for each r.

Let r be the value of r thus selected. The maximum-likelihood estimates ofA and B, to be denoted by A and B, are obtained through (15.13) and (15.12)with r — r. The estimate of Q follows from (15.7), and T is estimated by (15.6).Johansen (199la) proves that F, Q, and A are all V^-consistent and the limitingdistributions of their estimation errors are standard, and that, if B is somehowmade identifiable (see Section 15.2.1), B is T-consistent and the limiting distri-bution is represented by a mixed Gaussian. As indicated in Chapter 13 thet- and f-tests on B require only the standard normal and x2 distributions. Thusa non-standard test is confined to that on the co-integration rank.

5 I am indebted to suggestions by Hiro Toda on the writing of the present paragraph.

Page 240: Hatanaka econometry

Maximum-Likelihood Inference Theory 227

15.1.4 Simulation studies on the co-integration rank

Toda (1994, 1995) performs a cleverly designed simulation experiment on theselection of co-integration rank. The lag order is 1 and a priori known, whilethe dimensionality of x is 2. The results are that T — 100 is inadequate to detectthe true co-integration rank in a wide range of values of nuisance parameters.A part of the inadequacy is analogous to the weak power of the univariate unitroot tests mentioned in Section 7.3.2, which means that r — 0 is not rejectedoften enough when the true state is r > 1. But here is an added aspect due tothe multiple time series. This is the point that I referred to at the beginning ofPart II and in Section 13.4.3. Specification errors on the mode of deterministictrends would further aggravate the difficulty.

I am not suggesting abandoning the determination of co-integration rank, but Iam warning that it may be difficult in some cases of macroeconomic applications.

15.1.5 Determination of lag orders

So far I have assumed that the lag order, q, is known. In practice it is not, andit has to be determined from the available data before we begin the proceduredescribed in Sections 15.1.2-3.

Consider a group of VAR models for a k elements vector stochastic process,

Let q° be the true value of q, which means that T\.q = 0 for q = q°+l, ..., qm-dx.The methods available for the estimation of q° can be classified into two classes.

One is to choose the q that minimizes some criterion function of q. Let et(q) bethe k element residual vector at t in the OLS calculations with (x,_i, . . . , x,-q)as regressors, and write S(q) — (T -q)~l Y^=q+i et(q)e't(q)- Akaike informationcriterion proposed in Akaike (1973) is

and Schwarz information criterion proposed in Schwarz (1978) is

The. other class of methods is the general-to-specific model selection pro-cedure similar to the order determination in the univariate model explained inSection 6.2.3. The hypotheses D? = 0 are tested sequentially in the order ofqmax, <?max — L • • • , and one chooses the first q for which Hq — 0 is rejected. Anumber of methods are available to test for Il? = 0. Sims, Stock, and Watson(1990) show that, no matter what the co-integration rank may be, the Wald

6 Paulsen (1984) proves the consistency of the lag order selected by the Schwarz informationcriterion.

Page 241: Hatanaka econometry

228 Co-Integration Analysis in Econometrics

test statistic for I\.q = 0 is distributed asymptotically in x1 distribution with k2

degrees of freedom if q > q° + 1. Morimune and Mantani (1993) show thatthe likelihood ratio, Tln(d&t[S(q - l)/det[S(g)]), is distributed likewise, andalso that the likelihood ratio is better than the Wald test statistic because theempirical size is closer to the nominal size for finite T. Another interesting resultin Morimune and Mantani (1993) is that when OLS is run on

with Ax, as the dependent variable and (jc,_i, AJC,_I, . . . , Ajc,_9, 0 as the inde-pendent variables, the f-value for any element of FI, . . . , F9_i is asymptotically7V(0, 1) if its true value is zero. This can be used to search the highest lag orderin which a given (i, j) position of Fs is non-zero. The levels of significance mustbe determined to implement the general-to-specific model selection procedures.

Finally, in both of the two classes of order-determination methods the selec-tion of qmm affects the performance of the methods.

All the methods mentioned above are investigated in a simulation study inMorimune and Mantani (1993, 1995). They adopt respectively 1%, 1%, andl/(&2)% for the significance levels of the Wald, likelihood ratio, and Mests.Some highlights of the results are that both the likelihood ratio test and Mestsare useful to determine the lag order, and that especially the Mests should beused to detect small values in F matrices. As for the information criterion theAkaike information criterion is better than the Schwarz in retaining the orderwith small values in elements of F matrices.

It is only when the true lag order is a priori known that the asymptotic theoryfor the determination of co-integration rank is valid. A theory is unavailablefor the practical case where the lag order is determined from the data. In prac-tice the method in Sections 15.1.2-3 has been used, treating the empiricallydetermined lag order as though it were known a priori. A simulation studyin Morimune and Mantani (1995) shows that different methods for the lag-order determination tends to produce an identical rank of co-integration (eventhough they may propose different lag orders), and that the rank shared bydifferent methods is also identical to the co-integration rank determined in thehypothetical situation where the true lag order is known. However, this last deter-mination of co-integration rank may err frequently with T = 100 as mentionedin Section 15.1.4.

Chao and Phillips (1994) adopt a Baysian model selection procedure to deter-mine jointly the lag order and co-integration rank.

15.1.6 A research topic

If the MA unit root test in Section 3.4 is extended from a scalar MA processto a vector MA process for Axt, it would meet a request from the testing ofeconomic theories and also improve the selection of co-integration rank.

Page 242: Hatanaka econometry

Maximum-Likelihood Inference Theory 229

In the simple-to-general sequence of tests to determine the co-integration rankwe test initially for the absence of co-integration against the presence of at mostone co-integrating relation, next for at most one co-integrating relation against atmost two relations, . . . . But in testing an economic theory we wish to have thenull hypothesis indicate the existence of all the co-integrating relations derivedfrom the economic theory. In terms of the notations used above we wish to testfor H(r) against H(r), which is a nested test, or test for H(r) against H(r — 1),which is a non-nested test.

In Chapter 8 the discrimination between MQ and M\ is made by combining

two encompassing judgements, MQ —> M\ and M\ —> MQ. In the present

problem concerning two non-nested models, H(r) and H(r— 1), H ( r — l ) —> H(r)can be analysed by comparing the observed value of TXr against its distributionthat is applicable if the DGP is in H(r-l). In fact this comparison has been made

in the simple-to-general sequence of tests. What is missing is H(r) -» H(r— 1),

and it should be combined with H(r - 1) -» H(r). It turns out that this is alsowhat is requested in testing the economic theories.

H(r) and H(r — 1) are equivalent respectively to p(C(l) ) — k — r andp(C(l)) > k — r + 1, where C(l) is associated with the long-run compon-ent of AJC, in Section 12.1. Testing for H(r) against H(r - 1) is a non-standardproblem in the maximum-likelihood theory, but it is not in the locally bestinvariant testing theory.7

15.1.7 Deterministic trends

The co-integration space that nullifies the stochastic trends only has beendiscussed in the last paragraph of Section 12.2.5. In relation to (12.27) thecondition, A'±pi = 0, can be tested, and if a quadratic trend is found missingfrom {xr} we may proceed to the determination of co-integration rank, rewritingthe model as

(15.17)

where B* = [ B ' , b ^ ] , b \ = (A'A)~lA'iJ,i, and xf = (x't,t). The reason is asfollows. Using Ik = A(A'A)-1A' + A^A'^A^A'^, i.e. /*i = Abi + A±.CIwith c\ = (A'±A±)~]A'±fj,i, it is seen that A'j^i — 0 implies c\ = 0 and henceAB'x,-} + f l i t = AB*'x*_} + m. (The /40 in (15.17) is /i0 + Mi in (12.27).)The test and the limiting distribution are found in Johansen (1994), which alsocontains the test and the limiting distribution for the case where A'±Mi ^ 0. Seealso Osterwald-Lenum (1992) for the distribution.

From the result in Part I it is obvious that structural changes need to beintroduced, which has already been revealed in Mizon (1991) and Mellander,

7 Tanaka (1993), Harris and Inder (1992), and Leybourne and McCabe (1993) may be relevantto this problem.

Page 243: Hatanaka econometry

230 Co-Integration Analysis in Econometrics

Vredin, and Warne (1992). I illustrate the method by a one-time change in theslope of a linear trend. For each t let p, be a k x 1 deterministic vector suchthat

(15.2) is replaced by

The derivation of the likelihood ratio test statistic is analogous to (15 .5a- f ) , butthe random matrix (15.16) has to be altered. Let A. = (T — To)/T, and considerthe asymptotic theory with A fixed while T goes to infinity. For s e [0, 1]construct

Let w(s) be the (& — r) dimensional standard Wiener process, and let/(1)(,s) bethe first (k — r — 2) elements of

Provided that C(l)/i° / 0, C(l)/*1 ^ 0, and C(l)/i° and C(l)^1 are linearlyindependent, which implies that r < & —2, the/(s) in (15.16) should be replacedby (ft-l\s)',hi(s'),h2(s)'). The proof is analogous to appendices A and B ofJohansen (1991a). Note that g(s), h\(s), and h2(s) are mutually orthogonal over[0, 1] and also orthogonal each to a constant over fO, 1]. See Hatanaka and Koto(1994) for the other modes of structural changes and Kumitomo (1994) for ageneral treatment of deterministic trends.

15.2 Testing for Restrictions on the Co-integration Space

15.2.1 Identification problems and normalization rules

The co-integration matrix B' is not identified unless one introduces a normal-ization rule or a priori constraints derived from economic theories. If B' is aco-integration matrix, so is QB' with any non-singular r x rQ. (Remember,however, that A must also be transformed to AQ~l so that AB' is invariant.) Itis assumed that r is known below.

Different fis can be grouped into observationally equivalent classes. Two Bs,B\ and BI, belong to the same class if and only if there exists a non-singularr x rQ such that E'2 = QB\. Suppose that an economic theory postulates a set

Page 244: Hatanaka econometry

Maximum-Likelihood Inference Theory 231

of long-run relations B'x, — 0, and that we wish to test a property of B'. Anexample of the property is that b\ is zero in B' = (b\,..., bk), i.e. the firselement of xt is not contained in any of the long-run relations though containedin some short-run relations. This property is invariant throughout all the leftmultiplications of B' by non-singular matrices. Thus all the observationallyequivalent classes are classified into those in which all members have b\ — 0and those in which no members have b\ — 0. The condition, b\ — 0, is aproperty of (observationally equivalent) classes, and it can be tested by the data.For another example of the property of B' write B' — (b\,..., br)' and denote asubvector of b\ by b\\. Consider a property, b\\ = 0. This means that a portionof the variables does not appear in the first long-run relation. This property isnot invariant through left multiplications of B' by non-singular matrices. Eachclass contains those Bs such that b\\ — 0 and those Bs such that b\\ / 0. Thecondition, b\\ — 0, is not a property of (observationally equivalent) classes, andcannot be tested by the data. Noting that all the members of the same classbelong to the same column space of B (or the same row space of B'), Johansen(1988, 1991 a) emphasized that one could test on the co-integration space but noton the co-integration vectors. Phillips (\991d) notes that many practitioners havenot understood how the tests on the co-integrating space could be implemented.

A number of tests for the constraints on the co-integration space have beendeveloped in Johansen (1988, 1991a) and Johansen and Juselius (1990, 1992),and they will be presented below. It is important to confirm that the constraintsare properties of the observationally equivalent classes maintaining the invari-ance considered above. I emphasize two points involved in the basic method-ology of Johansen (1988, 1991a).

1. Restrictions to the observationally equivalent classes may be interpretedas a result of compounding two restrictions. One is a just-identifying restrictionrepresented by a normalization rule, and the other is an overidentifying restric-tion. Let F be a known k x r matrix such that p(F) — r. Then a normalizationrule is provided by F'B — Ir involving r2 constraints.8 Under the rule no r x rtransformation matrices Q other than Ir are admitted for the transformation of B'into QB' to keep AB' invariant. (If F'B0 = Ir, F'B0Q' = Q'.) Therefore thereis one and only one B' in a given (observationally equivalent) class that satis-fies the normalization rule. In other words the rule just-identifies B'. Thereforeany constraints that are introduced in addition to the normalization rule wouldlead to overidentification of B'. These constraints, which will be called effectivein the following description of various tests, discriminate those classes whichadmit them and those which do not. One can test these effective constraints. Thedegrees of freedom are the numbers of these effective constraints. Illustrationswill be provided in Section 15.2.2.

2. A given restriction may be represented by a combination of different rulesof normalization and different over-identifying constraints, but, in so far as

8 A seemingly more general form, G'B = M, can be put in the form F'B = Ir. M is a knownr x r non-singular matrix, and G' is a known r x k matrix with full row rank.

Page 245: Hatanaka econometry

232 Co-Integration Analysis in Econometrics

the given restriction is a property of the observationally equivalent classes asdefined above, the results of testing the restriction should be independent of thenormalization rule adopted. Thus normalization rules are implicit in the Johansenmethod, but it is not important there which rules are adopted.

At this point it is appropriate to look back at the co-integrated regression inChapter 13, considering the role that normalization rules might play there.

1. In Section 13.4.1 it was shown that it is the a priori knowledge aboutlocation of an r x r non-singular submatrix of B' that generates a co-integratedregression. In regard to this point it is important to bear in mind that if F' is notspecialized beyond r x k and having a full row rank, the restriction F'B — lr

does not necessarily locate an r x r non-singular submatrix of B'. It is a specificnormalization such as F' = (0, Ir) that specifies location of an r x r non-singularsubmatrix, which in turn generates a co-integrated regression.

2. While a general normalization selects one representative from each andevery observationally equivalent class, the specific normalization such as F' —(0, /r) rejects those classes in which the last r rows of B form an r x r singularmatrix. The specific normalization that generates a co-integrated regressionexcludes some observationally equivalent classes from our consideration fromthe outset. Such normalizations cannot be regarded as innocuous in our inferenceprocedures.

3. Normalization rules in general free us from worry about the lack of ident-ification in B. The regression coefficients in the cointegrated regression modelsare identified.

Kleibergen and van Dijk (1994) present a methodology that is intermediatebetween the Johansen method and the co-integrated regression. They test fora co-integration rank, r, a priori specifying location of an r x r non-singularsubmatrix of B'.

15.2.2 Various tests in light of the identification viewpoints

Based on a reasoning analogous to but more complicated than Chapter 13concerning the mixed Gaussian estimator, Johansen (199la) shows for a knownvalue of r that (—2) x (the log-likelihood ratio) to test any p effective constraintsupon B is asymptotically distributed in x2 distribution with p degrees offreedom. This general result may be specialized into several forms of linearhypotheses, for which the test statistics can be easily calculated. They arepresented in the following subsections, 1-7, where r is known. In practice therank selected in Section 15.1 is used without invalidating the asymptotic theory.A section below deals with a hypothesis on A as well.

I repeat two important points from Section 15.2.1. (a) Two Bs, B\ and 82,are observationally equivalent if and only if BI = B^P with some non-singularP; (b) a condition on B can be tested empirically only when it is a property of

Page 246: Hatanaka econometry

Maximum-Likelihood Inference Theory 233

observationally equivalent classes, namely, the condition either holds in everymember of an observationally equivalent class or fails in every member of anobservationally equivalent class. An alternative statement is that a condition onB can be tested if and only if a B that meets the condition and another B that doesnot cannot be observationally equivalent. I shall show this testability on eachone of the hypotheses that have been dealt with in the literature. Users of thesetesting methods should keep in mind that what they investigate are not valuesof parameters, which are unidentified, but observationally equivalent classes ofparameter values.

The following results hold no matter what the modes of deterministic trendsmay be in so far as they are correctly specified and handled in Section 15.1.

1. B = HQ with an a priori known k x rH.Suppose that an economic theory specifies H'xt = 0 as the long-run relations,

where H is a priori specified and p(H) = r. The r x r Q is unspecified and arbi-trary. This means that the base of the column space of B is completely specified.Take this for the null hypothesis, and consider testing it against H (r), whichis simply the statement that the co-integration rank is r but B' is completelyunspecified otherwise.

Let MH = Ik -H(H'H)-1H'. Then MHH = 0. The condition of the hypo-thesis, B = HQ, is MHB — 0. Concerning two Bs, B and B*, suppose thatMHB — 0. Concerning two Bs, B and B*, suppose thatHB — 0 and MHB* / 0. Then it is not possible to have B* = BP with anon-singular P. Therefore the condition of the hypothesis is a property of theobservationally equivalent class.

The likelihood ratio test statistic is given in Johansen (1988) and Johansenand Jurelius (1992). The number of effective constraints is derived as follows.Partition H' = (H(,H{) and B = (B[,B^', where H{ and B( are both r x r,and adopt the normalization rule F'B =Ir with F' — [Ir, 0]. Then B\ is set toIr by the rule. Also Q is now restricted to H-^1. The effective constraints bindB2 as //2//f'. The degrees of freedom are the number of effective constraints,i.e. the number of elements of 82, which is r(k — r). The degrees of freedomdo not depend on the particular F chosen above.

2. B = HQ with a known k x r\ H, where r\ > r.The a priori specified matrix H is k x r\, r < r\ < k, and p(H) = r\, and

Q is r\ x r but arbitrary otherwise. The null hypothesis, B = HQ, means thatthe base of the column space of B is contained in the vector space spanned bycolumns of H, i.e. the column space of B is contained in the column space ofH. The constraint may be used to place a common restriction upon all columnsof B as seen from the following example,

In HQ the first and the second elements of every column have identical absolute

Page 247: Hatanaka econometry

234 Co-Integration Analysis in Econometrics

values with opposite signs. It can be shown in the same way as 1 above thatB = HQ is a property of observationally equivalent classes. See Johansen andJuselius (1992) for the derivation of the test statistic and its illustration.

The number of effective constraints is obtained as follows. Partition B =(B(, B2)', where B\ is r x r, and 82 is (k — r) x r, and also par

where H\\, H\2, H2\, and H22 are respectively r x r, r x (r\ - r), (k-r)x r, and(k — r)x (r\ — r), and Q\ and Q2 are respectively r x r, and (r\ —r)xr. Assumen is non-singular. Then B2 = HnH^lB\ + (H22 - H2\H^lHi2)Q2. SetBI — Ir by the normalization rule. Under the alternative hypothesis, whichis H(r), the number of free parameters is the number of elements of B2, i.e.r(k — r). Under the null hypothesis, B = HQ, the number of free parametersis the number of elements of Q2, which is r(r\ — r). Therefore the numberof effective constraints is the difference, i.e. r(k — r\). See Section 15.5 forapplications.

3. B = (HQ,R) with a known k x r\H and unspecified r\ x r\Q, wherer\ < /", if B is appropriately rotated.

The known matrix H is k x r\, 0 < r\ < r, and p(H) = r\. Q is r\ — r\but arbitrary otherwise, and R is k x (r — r\) but arbitrary otherwise. The nullhypothesis means that the column space of H is a proper subspace of the co-integration space, in other words, the first r\ columns of B can be made equalto H if it is properly rotated.

Let H_L be a k x (k - r\) matrix such that H^H = 0 and p(//j_) — k — r\.Let G =H(H'H)-l/2, G_L = H^(H'JiL)-1/2. Then in gene

where G'^B and [0,R] are each (k - r\) x r, P is r x r and non-singular, Ris a (k - r\) x p(G'±B) matrix, and 0 is (k - n) x (r - p(G'±B~)) zero matrix.Multiplying P~' from the right it is seen that the first r — p(G'±B) columnsof BP l are spanned by columns of H. The condition of the hypothesis isthat p(G':LB) < r — r\. This is a property of the observationally equivalentclass, which can be shown as follows. Suppose that p(G'±B) < r — r\ andp(G'±B*) > r - r\. It is impossible to have B* — BP* with a non-singular P*.

See Johansen and Juselius (1992) for the derivation of the test statistic andits illustration. Partitioning B = (B(, B2)', where B] is r x r, it is seen that theconstraints bind only the submatrix consisting of the last (k — r) rows and thefirst r\ columns.

4. B = (HQ,R) with a known k x sH and unspecified s x r^Q, wherer\ < s < k, if B is appropriately rotated.

The present hypothesis is an extension of the earlier Subsection 3. In fact oneobtains 3 above by making s = r\ in 4. This is a property of the observationallyequivalent class, which can be proven in the same way as in Subsection 3. An

Page 248: Hatanaka econometry

Maximum-Likelihood Inference Theory 235

inference theory is found in theorem 3 of Johansen and Juselius (1992). Thecomputation uses an iteration.

5. c'b = 0 where r — I , B is denoted by b, and c is a known non-stochasticvector.

This is clearly a property of the observationally equivalent class. A x2 testwith 1 degree of freedom is available for this special case in corollary 5.3 ofJohansen (1991a).

6.A = GP and B = HQ with known G and H.Let G and H be respectively k x ra and k x rb known matrices, and let P

and Q be respectively ra x r and rb x r arbitrary matrices. It is assumed thatr < ra < k, p(G) = ra, r < rb < k, and p(H) — rb. Notice that B = HQ hasbeen dealt with in 2 above. The null hypothesis is that A — GP, B = HQ, andp(A) — p(B) = r, which is tested against H(r). A test is given in Johansen(199 la).

7. Separation.Block-diagonality of B' after a proper rotation is called the separability in

Konishi and Granger (1992), which also presents a likelihood ratio test for theseparability. Here I shall demonstrate that the separability is a property of theobservationally equivalent class. Let B( and B2 be respectively r\ xk\ and r^ xkiwhere r\ + ^2 = r, k\ + ki = k. For a properly chosen r x r non-singular P wehave PB' — diag [B(, B2}. Suppose that B't is such that there is no r x r non-singular P* that makes P*B^ = diag[#^, B^2], where B'tl and B^2 have thesame numbers of rows and columns as B( and B2 respectively. It can be shownthat B'^ — QB' with a non-singular Q does lead to contradiction, i.e. B* and Bcannot be observationally equivalent. If B't = QB',B'^ = QP l diagf^.B^]-The choice of P* = PQ~l should make P*B^ = diag [B[,B2].

8. Wald tests and likelihood ratio tests for general hypotheses.Johansen (1991a) proposes the likelihood ratio tests for general hypotheses

on the co-integration space, but Hoffman and Rasche (1991) point out thatWald tests are available for any linear constraints on the co-integrationspace. The constraints are interpreted as results of compounding appropriatelychosen normalization rules and some effective constraints as explained inSection 15.2.1. The limit distribution of normalized B, which is mixed Gaussian,is given in Johansen (1991a, theorem C.I). The null distribution of the Wald teststatistics are asymptotically x2 Just as the likelihood ratio test statistics are. SeKunitomo (1994) for a full investigation of the Wald and Lagrange multipliertests on the co-integrated VAR model.

My final remark on the testing is that procedures for testing constraints upon(A, PI, . . . , r?_i) in (15.2) are standard because \/T x (the estimation errorsare asymptotically Gaussian. Even the constraints upon (A,B, PI, . . . , Fg_i)can be tested in a standard way with B replacing B in the constraint becauseB -B isOp(T-1).

Page 249: Hatanaka econometry

236 Co-Integration Analysis in Econometrics

15.3 A Common Practice on B'

A common practice in publishing applications of Johansen method is to presentB' in 15.12 (a), often normalizing one element of each row to unity. It has beennoted above that B is not identified without some r2 constraints. As emphasizedin Phillips (199Id) B' is an estimate of unidentified parameters, and hencedevoid of any meaning.

Let us recall how B' has been obtained. It was derived from minimizing theexpression (15.9). But the minimization cannot determine B uniquely because(15.9) is invariant through transformations from B to BQ with non-singular Q.Therefore (15.12) is only one of many ways to determine B. Admittedly (15.12)(b) is a kind of normalization. But it has only r(r +1 )/2 constraints not sufficientfor the unique determination of B. Moreover it means nothing from the stand-point of economic theory. Let (wi,..., wk) be the orthonormalized eigenvectorsof Sn

1/2Si2S0^SoiS1~;1/2, and let ( v j , . . . , vt) = S^ 1 / 2(wi , . . . , wk). Then B is( v i , . . . , vr). This is an arbitrary way to determine B.

One might argue that B' is presented to let the readers perform any rotationsof B' that they desire. I do not deny its usefulness for informal judgement, butthe final judgement should be based upon the hypothesis testing described inSection 15.2.2.

Admittedly the common practice criticized above does reflect some demandfrom econometricians. Johansen's method certainly enables us to test forconstraints on the co-integration space, but if such constraints are not rejectedwe like to proceed to estimating the parameters using the constraints. Forthe estimation of B we need constraints strong enough to identify B. Theconstraints that represent properties of observationally equivalent classes asgiven in Section 15.2.2 are not sufficient for the purpose. Not much has beeninvestigated in the literature.

Incidentally, the impulse-response function does involve AB' but does notinvolve A or B separately. It is free from the identification problem. SeeLutkepohl and Reimers (1992) and Mellander, Vredin, and Warne (1992) onthe estimation of the impulse-response function using the error-correction formof co-integration.

15.4 Weak Exogeneity and Granger Non-causality

15.4.1 A relation among the weak exogeneity, Granger non-causality in thelong-run, and the co-integrated regression

Conditional models are frequently used in econometric models, and the weakexogeneity condition enables us to perform efficient testing on such models(see Section 11.2). Suppose that the vector time series {x,} in (15.2) with co-integration rank r is partitioned into (x'lt, x^)', x\,, and x2t having respectivelyk\ and ki — k — k\ elements. For simplicity of exposition fi is suppressed to

Page 250: Hatanaka econometry

where

The model of x2t conditioned on x\, is

(15.20)

where wr = e2, - H2\S]t is i.i.d. with 7V(0, £222.i). The marginal model of x\, is(15.18a).

It is difficult to envisage the weak exogeneity unless the relevant parametersare at least just-identified. Since p(B) = r there must be at least one r x rsubmatrix of B' that is non-singular. Denote it by B'F. We shall assume thata priori information is available to determine which columns of B' compose aB'F. It was shown in Section 13.4 that this information enables us to identifythe entire B' uniquely by normalizing B'F to a unit matrix.

What is important from the practical point of view is the case where B2, whichis r x k2, contains a B'F. (This is possible only when k2 > r.) It will be shownfor this case that A] = 0 is equivalent to both the Granger non-causality in the

and the conditional log-likelihood function is

(15.196)

Maximum-Likelihood Inference Theory 237

zero, q is assumed 2, and FI is written F. The first k\ rows and the last k2 rowsof F and A are denoted respectively FI and A\ and F2 and A2 • Q is partitionedconformably as

Let us write the original model in the partitioned form,

(15.18a)

(15.186)

The parameters are A.\, A2,B', FI, F2, vech(£2n), Si\2, vech(Q22)> where vech(-)is the vector of elements of the matrix in ( ) eliminating double-counting. Letus also partition B' as [ B [ , B 2 ] .

The conditional and the marginal models were derived in (11.15) and (11.16).Though k\ = k2 = 1 there, its extention to general cases is straightforward. Themarginal log-likelihood function is seen equal to

Page 251: Hatanaka econometry

238 Co-Integration Analysis in Econometrics

long run and the weak exogeneity. I shall begin with the Granger non-causality.As shown in Section 12.2.6 A\B2 = 0 is an important part of the condition forx-i failing to cause Jti in Granger's sense, and may be regarded non-causalityin the long run. Hunter (1992) calls it the co-integrating exogeneity, but I shalladopt the term, Granger non-causality in the long run. As explained in Hunter(1992), it means that x2t does not contribute to the long-run forecasts of x\t.If B'2 contains a B'F then A\B2 = 0 is equivalent to A! =0. This is becauseA^B2 = 0 =J> AiB'p = 0 =» AI = 0 and obviously Al = 0 =» AiB2 = 0.

Johansen (1992c, \992d) state that Ai = 0 is necessary and sufficient for theweak exogeneity of x\t, Indeed if AI = 0, the marginal log-likelihood function,(15.19a), has the parameters, FI and vech(fin) only, while the conditional log-likelihood function, (15.20), has A2, B', G2, H2\, vech(Q22.i), where a submatrixof B',B'[?, is normalized. Johansen (1992c) demonstrates that the two groupsof parameters are variation-free when AI = 0. (15.18a) and (15.20) are nowrespectively

(15.18')

(15.20')

Even though A*2r-i is involved in the marginal model, it is irrelevant to theproblem as shown in the explanation associated with (11.9). The x\t-\, &x\t-\,and AJCI, are weakly exogenous in (15.20') with respect to not only B' but alsoA2, G2, H2\, and vech(J222.i). The necessity of A] =0 for the weak exogeneityis stated in Johansen (1992d).

When AI = 0 the maximum-likelihood estimation of the parameters in(15.20') can be based upon (15.20') only. In reference to (15.4) we nowset ZQ, — A*2,,zir = *,-i(notx2r_i),z2, = (Ajc(r, A*f '_j)',A = A2, F =(H2i,G2), £1 = fl22.i. The rank r is determined from the conditional model,and a testing on the co-integration space is also performed on the conditionalmodel.

Let us consider what is meant by the condition A\ = 0. When Al — 0,(15.18a) involves differenced variables only, and hence it cannot represent a co-integrated regression. (Summing over time on each side of (15.18a) generatesa random walk for the disturbance.) The co-integration is confined to theconditional model, (15.20), between Xi and x2. We have had a similar situ-ation in Section 13.4.2. Moreover, when A\ = 0, the upper half of AB' is zero,and B'x,-i does not subject Ax\, to the adjustment toward equilibrium (seeSection 12.3.3).

In the present context of B2 containing a B'F it is impossible to have k\ > k—r,i.e. more than (k—r) variables cannot be weakly exogenous. Therefore pragmaticadvice on the inference procedure is to begin with the determination of co-integration rank in the entire system. If it is determined as TO, then at most(k — TO) variables are candidates for the weakly exogenous variables. The nextstep of our inference is to test for AI = 0. Johansen and Juselius (1990, 1992)show that a standard x2 test is available.

Page 252: Hatanaka econometry

Maximum-Likelihood Inference Theory 239

Continuing with the assumption, A\ — 0, an important special case is k\ =k — r, i.e. r = k2. Regarding B' = (B{, B2) B{ and B2 have k — r and r columnrespectively. Co-integrating relations provide r equations to be solved for relements of x2 because B2 is now B'F and hence non-singular. Thus x\ and xiare respectively exogenous and endogenous variables.

Johansen (1992c) notes that, if r — k2, (15.20) or (15.20') is no longera reduced-rank regression, because all rk elements of A2B' are free to movein the rk dimensional Euclidean space.9 The process of concentrating the log-likelihood is much simplified when applied to this special case of the conditionalmodel. Setting B2 = Ir, (15.20') can be rewritten as

(15.21)

A2}G2\, and v, s —Aj'zi/ . (15.21) is analogous to the co-integrated regression

considered in Chapter 13. From (15.21) we see that Aac2r = CAjti, + terms in/(-I), and on substituting it in (15.18') we get

which indicates that the long-run component of {jti,} can be expressed by {ei,}only. Since v, = -A2(e2,-H2ie\,), it is possible to show that the OLS estimatorof C in (15.21) is mixed Gaussian. In fact this also follows from the theory ofthe present chapter, because the inference on the basis of the conditional model(15.21) must be identical to the inference on the basis of the entire system (15.2)under the weak exogeneity. Thus it can be said that the co-integrated regressionis essentially a special case where (k — r) variables are weakly exogenous. Thisis what I meant in Section 13.4.3, where the materials in Chapter 13 are said tobe subsumed in Chapter 15.

Nevertheless I gave a certain role to the co-integrated regression inSection 13.4.3 in view of the limited capability that the maximum-likelihoodmethod of VAR has with the length of time-series data available formacroeconomic studies. This was shown in Section 15.1.4.

It is seen from the above development that A\ — 0 continues to be necessaryand sufficient for the weak exogeneity even if B2 does not contain a B'F, inwhich case, however, A\ = 0 is not related either to Granger non-causality inthe long run or to the co-integrated regression with x2 as the dependent variable.I do not think that this is an interesting case to consider in practice.

15.4.2 Partitioning the co-integrating space

It makes sense to partition B as

9 There exist r(k — r) free parameters in B' after the normalization, and the entire r2 elementsof A 2 are free.

Page 253: Hatanaka econometry

240 Co-Integration Analysis in Econometrics

provided that just-identifying constraints are introduced in the co-integrationspace. In some applications we are interested in the co-integrating relations,B'bx, = 0 only under the supposition that neither the just-identifying constraintsnor any other constraints relate B'b to B'a. The just-identifying constraints in noway bind A, which is however identified.

Banerjee et al. (1993: 288-91) investigate the above partitioning in conjunc-tion with the partitioning of x't — (x(t, x2l) and the conditioning of X2t upon x\t.Thus A and B are further partitioned as

(15.22a)

(15.226)

(15.23)

Suppose that the parameters of interest are (Ay,, B(b, B2h). A sufficient conditionto make A*i, and A*i,_i weakly exogenous in the conditional model is thatA it = 0 and A2a - H2\A\a = 0. The parameters in the marginal model areA\a, B[a, B2a, PI, vech(fin), and the parameters in the conditional model areA-2b,B(h,B'2b,H2\,G2, vech(fi22.i)- The two sets of parameters are variation-free. The co-integrating relation B'ax, = 0 appears in the marginal model butnot in the conditional model, and B'hx, — 0 appears in the conditional modelbut not in the marginal model. How to test for AH, = 0 and A2a - H2\Aia = 0is described briefly also in Banerjee et al. (1993: 290).

15.4.3 Granger non-causality

Let us turn to Granger non-causality. The concept was explained in Section 11.3on two types of VAR models, the stationary one and the one that is non-stationaryand not co-integrated, and also in Section 12.2.6 on the VAR that is non-stationary and co-integrated. The explanation dealt with the case, k\ — k2 — 1,but its extension is straightforward. In all cases Granger non-causality is theblock triangularity of the VAR system. Here we are concerned with testing.Mosconi and Giannini (1992) implement the likelihood ratio test by an iterationprocedure, and Toda and Phillips (1993, 1994) analyse the limiting distributionsof the Wald test statistics. I shall present the results of Toda and Phillips (1993,1994).

The marginal model is (15.22a), and the conditional model is

and (15.18a) and (15.180) are rewritten as

Page 254: Hatanaka econometry

Maximum-Likelihood Inference Theory 241

Suppose that x't — (x[t,x2t,x3t), where x\,,X2t, and x^t have respectively,k\,k2,k3 elements, and that we are concerned with Granger causality of jc3

upon x\. The null hypothesis is the absence of causality. Toda and Phillips(1993, 1994) consider two types of Wald test. One is the F-test based on theOLS of the ordinary VAR form,

(15.24)

Partitioning

conformably with the partitioning of xt, the null hypothesis is Il,i3 = 0, i —1 , . . . , q. (However, the limiting distribution of the /-"-statistics is derived onthe assumption that the data-generating process is a possibly co-integratedVAR.) The other Wald test is based on the reduced-rank estimator of the error-correction form, (15.2). The null hypothesis is F/JS = 0, i = 1 , . . . , q — 1 andA]/?3 = 0, where F/i3 is analogous to II/is above, and A = (A',, A2, A'3)',B' —(B[,B2,B3). Derivation of the second Wald test is not as straightforward asmany other problems in econometrics (see Toda and Phillips (1993)).

We have p(B'3) — £3 if columns of B3 are contained in a non-singular rxrB'F.Toda and Phillips (1991) show that provided that p(B3) = £3, the /"-statisticbased on (15.24) is distributed asymptotically in x2 with degrees of freedomequal to the number of constraints under the null hypothesis. Otherwise thelimiting distributions are complicated. For example, if k = 3, k\ — k2 = &3 =l,r = 1, and b3 / 0, then the F-statistic is x2-10 But it is not if k — 3, &, =1, ki = 0, k-} = 2, and r = 1. As for the test based upon the reduced-rankstatistics two alternative sufficient conditions for the null limiting distributionsbeing x2 are p(A\) — k\ and p(B3) = k$.

It may be useful to test for A\B3 = 0, which is a case of Granger non-causalityin the long run.11

15.5 Applications

There are a large number of empirical studies using the method in Johansen(1991a). The following survey describes (i) difficulties that have been recognizedin the course of implementing the method and (ii) contributions toward theempirical testing of economic theories.

King et al. (1991) investigate a particular version of the real business cycletheory, in which a series of shocks to the productivity growth generates thebusiness cycle while driving the logarithms of output (y), consumption (c),

10 This is the case which Sims, Stock, and Watson (1990) analysed." Granger and Lin (1995) propose a measure of causality in the long run.

Page 255: Hatanaka econometry

242 Co-Integration Analysis in Econometrics

and investment (i) in a balanced growth. In applying the Johansen method x'tis (y,,ct,it), and the data are seasonally adjusted for post-war USA.12 It isconfirmed that r = 2, and that one way to represent the co-integration space is

Therefore k — r= \, B'± may be taken as (1, 1, 1), and since p(C(l)) = 1, H2' =A'L may be taken as (1,0,0). C(l) is (1, 1, !)'(!, 0,0) (see Section 12.2.3).Let e\, be the first element of et. The ei, can be interpreted as the shock tothe productivity growth rate. The contribution of the common trend to x, isC(l)H2 £^=1 & in (12.6), which is (1, 1, 1)' ]T^=1 sls in the present ca

For readers interested in the structural VAR model and also the identificationproblem I mention that King et al. (1991) in fact start with a simultaneousequations model. They impose a priori the balanced growth in xt. Since x, is 7(1),it implies the co-integration space given above. They further impose an a prioricondition that among three shock variables the shock to productivity growthis uncorrelated with the other two. Then it is proven that these two a prioriconditions together identify all the parameters in the simultaneous equations. TheJohansen method was used afterwards to confirm the co-integration property apriori imposed earlier. Note that a particular structure of the co-integration spacecontributes to the identification.

King et al. (1991) ask themselves what percentage of the observed variationsin v, c, and z are accounted for by the single common stochastic trend just found.The variations are represented by the error of prediction in different horizons.For the horizon T the error is

(15.25)

in the MA representation (12.1). Diagonal elements of the convariance matrix of(15.25) are each compared with the variance of (1, 1, 1)' Xjj=o £i,t+r~h for r —1,2,... . The results on US post-war quarterly data are that relative contributionsof the common trends are large in y and c but not in i. The above measure ofvariability is often used in econometrics, and the similar analysis is expected tobe useful in many econometric analyses.

King et al. (1991) then consider a six-variables system in which the inflationrate, nominal interest rate, and real money stock are added to (v, c, i). Macro-economic theories suggest two additional co-integrating relations, the moneydemand equation, and the Fisher effect. That r — 3 in the six-variables systemis confirmed by the statistical analysis, but relative contributions of the commonstochastic trend produced by the shock to productivity growth are much reducedfrom the three-variables system.

12 The invariance of the co-integration matrix before and after the seasonal adjustment is shownnicely in Banerjee et al. (1993: 301-2). It assumes that the adjustment is a linear filter, while themethod in practical use, such as X — 11, is not. I imagine that the non-linear effect is negligible.

Page 256: Hatanaka econometry

Maximum-Likelihood Inference Theory 243

Mellander, Vredin, and Warne (1992) also investigate the real business cycletheory, but differ from King et al. (1991) in adding the terms of trade to (y, c, i)and in using the annual, historical data of Sweden. Moreover their deterministictrends include structural changes. Kunst and Neusser (1990) and Neusser (1991)extend King et al. (1991) to many countries, not only within each country butalso across the countries with the real interest rate added to (y, c, i).

The long-run and short-run market equilibrium conditions have been invest-igated by the co-integration analysis. Juselius (1991) and Johansen and Juselius(1992) analyse the macroeconomic aspects of the exchange rate incorporatingthe PPP and UIP (uncovered interest parity). Johansen and Juselius (1992) studythe UK quarterly data from 1972 I to 1987 II so that T is as small as 62. Thesystem has five variables: UK price (jci), the world price (j^), UK exchange ratevis-a-vis all other currencies (x^), UK interest rate (xt), and Eurodollar interestrate (x$). To these is added the oil price as the exogenous variable, becauseit eliminates non-stationarity in the variances of disturbances. As for the co-integration rank the trace test and the maximum eigenvalue test give r = 2 and= 0 respectively. They chose r = 2. A test for weak exogeneity suggests that theworld price is, but Eurodollar interest rate is not, weakly exogenous. Method2 in Section 15.2.2 is employed to test for the null hypothesis that the two-dimensional co-integration space has the particular structure (1, — 1, 1, x, x),where x means unspecified. The hypothesis is accepted. Then method 3 inthe same section is employed to test for the hypothesis that (1, —1, 1, 0, 0) isincluded in the co-integration space. The hypothesis is rejected. This meansthat the PPP relation appears in both of the two co-integrating relations, butthe relation is accompanied in both by the interest-rate variables, i.e. the PPPcannot stand by itself. I think this is a new, interesting evaluation of the PPP. Itis found that the UIP can stand by itself.13 Hunter (1992) re-examines Johansenand Juselius (1992) especially in regard to the role of oil price.

The co-integration analysis of behaviour equations is feasible only when theequations are identified. But some co-integrating relationships are implications ofthe specifications of behaviour equations (for example, on the selection of vari-ables), and the equations may be tested by way of testing for the relationships.

Hoffman and Rasche (1991) examine the long-run relationship between realmoney, real income, and either short-term or long-term interest rates, acknow-ledging in a footnote that the relationship is the demand for money only ifthe latter is identified. (It is possible that the demand function is identified inthe long run but not in the short run.) The data are the post-war monthly datafor the USA. The long-run relationship involving a short-term interest rate isconfirmed, but that involving a long-term interest rate is not. McNown andWallace (1992) are concerned with what is called the McKinnon hypothesison the demand for money, i.e. the stability of the demand function requires

13 Kugler and Lenz (1993) analyse the monthly data for PPP. Diebold, Gareazabal, and Yilmaz(1994) find that different daily nominal exchange rates are co-integrated without fn, in (15.2), butnot co-integrated with /i.

Page 257: Hatanaka econometry

244 Co-Integration Analysis in Econometrics

exchange rate as an explanatory variable (see McKinnon et al. (1984) for thishypothesis). On the US quarterly data from 1973 II to 1988 IV the P-valuesof the statistics to test for r = 0 are compared between the systems with andwithout the exchange rate. The hypothesis is supported. I think this is an inter-esting application of the co-integration analysis that it is hoped may be usefulin many branches of applied econometrics. However, the inference procedurein McNown and Wallace (1992) should be modified. Consider a four-variablessystem, (m, y, i, /), where / is the exchange rate. If r is found equal to zero,there is no money demand function. If r is found < 1, apply 5 of Section 15.2.2with c' = (0, 0, 0, I). If r is found < r0 and r0 > 2, apply 4 of the same sectionwith H = [/3,0]' and a 3 x 1 matrix Q. The four-variables system is usedthroughout the study. The results on the three variables (m, y, i) are hard tointerpret as the effect of / is not made explicit.14 The Fisher effect specificallyupon long-term interest rates has been examined in Wallace and Warner (1993),using post-war US quarterly data. Co-integration between the inflation rate andlong-term rates is confirmed by the maximum eigenvalue test.

It has been pointed out in Section 11.1 that if feasible simultaneous equationsmodels are a logical specialization of feasible (standard) VAR models the over-identifying constraints in the former models can be tested against time-seriesdata. This kind of approach has been pursued in Mizon (1991), Clements andMizon (1991), and Hendry and Mizon (1993) in the framework of the modelselection methodology of the British School, including the encompassing, whichis described most fully in Hendry and Mizon (1993). Since the VAR maypossibly be co-integrated they estimate the VAR by the Johansen method. Afterthe rank is determined, the constraints on the simultaneous equations modelscan be expressed as those on (A, B, FI, . . . , F9_i) in (15.2), and the constraintscan be tested by traditional likelihood ratios dealing with the estimation errorsin the standard OP(T~1/2). Placing the error-correction term at (/ — q) ratherthan at (t — 1) as in (15.2), they consider an autoregressive form, with order 1,of %t = (Ax , , . . . , Ax(_9+1, B'xt-g). It plays a role of a bridge between the VARand the simultaneous equations model. This autoregressive form is also usefulto perform diagnostic investigations of the estimated VAR, because {£,} is /(O),the autoregressive form is stable, and B may be replaced by B (see equation (5)of Hendry and Mizon (1993)).

Hendry and Mizon (1993) analyse four variables (m — p, Ap, y, r), wherer is the logarithm of interest rate. The data are seasonally adjusted quarterlyUK data. A preliminary univariate analysis shows that m — p, y, and r are TSaround linear trends and A/? is 7(1), i.e. p is 1(2). They formulate an error-correction form of VAR on (m — p, t\p, y, r). The Johansen method shows thatr = 2. The estimated model successfully passes all the diagnostic tests includingthe constancy of parameters. However, some dummy variables are introducedto deal with regime changes. The VAR can be a basis on which simultaneous

14 See also Hafer and Janscn (1991) for the demand for money.

Page 258: Hatanaka econometry

Maximum-Likelihood Inference Theory 245

equations models are to be judged. One such model is not rejected.In similar fashion Mizon (1991) analyses the relation between aggregate price

change and relative price variability in conjunction with real earnings, logarithmsof average hours of work, productivity and unemployment. Moreover Clementsand Mizon (1991) investigate (e — r, Ar, p-a,a, u) where e, r, p, a, and u arerespectively earnings per hour worked, CPI, productivity, average hours worked,and unemployment all in logarithms. A large number of dummy variables isintroduced.15

Johansen (1992d) shows that the income (y) and the rate of interest (r) areweakly exogenous in the study of money demand in Hendry and Mizon (1993)discussed above. Brodin and Nymoen (1992) apply the weak exogeneity testin the system of the logs of consumption (x\), income fe), and wealth (XT,).The co-integration rank is found equal to 1, not 2 as one might suspect. Theadjustment vector, a = (a\, a2, 03)', is such that it is only a\ that is significantlydifferent from zero. Thus income and wealth are weakly exogenous.

Friedman and Kuttner (1992) find that the relationship among money, realor nominal income, interest rate, and price observed prior to 1980 in the USAceased to hold after 1980. The structural changes seem to be important in manyfields of time series for economic studies.

I now turn to the financial markets. Baillie and Bollerslev (1989), McDermott(1990), and Copeland (1991) analyse daily data of spot rates in foreign-exchangemarkets. As indicated in Section 12.3.3 existence of a co-integrating relationcontradicts the efficient-market hypothesis, and a day should be an appro-priate time unit for this consideration. Absence of co-integration is supported inCopeland (1991) but not in the other studies mentioned above. Cross-re-examinations are needed. Baillie and Bollerslev (1989) and Copeland (1991)analyse also the forward rates and their relations to the spot rates in foreign-exchange markets.

Kasa (1992) finds stock price in different countries driven by a single commontrend. Quarterly data are found more efficient than monthly data in determiningthe co-integration rank. It might be interesting to analyse the point in termsof the framework that Toda (1995) adopts for his simulation. Selection of lagorder is found important in Kasa (1992). Hall, Anderson, and Granger (1992)investigate spreads among the yields of assets with different maturities in theUS monthly data. All the spreads are driven by a single common trend as theyshould be if the present-value model holds true. However, the study is somewhathandicapped by the regime change in September 1979 and October 1982 andthe drastic changes in the volatility, much greater volatility in 1980-3 than inthe remaining periods.

The following conclusions emerge. The major difficulties that one faces inapplying the Johansen method are due to structural changes, not just in the

15 A review, Kirchgassner (1991), conjectures that a simultaneous equations model with a largerset of economic variables might be better than the feasible VAR in reducing the number of dummyvariables. Needless to say that it requires substantiation.

Page 259: Hatanaka econometry

246 Co-Integration Analysis in Econometrics

deterministic trends but more seriously in the variance. Since T is small, espe-cially in the studies of foreign-exchange markets, there is uncertainty about thedetermination of co-integration rank. All researchers use the relevant economictheories as a guide for the interpretation of the results. Partly because of thispractice it is seldom that the economic theories are rejected outright. The mostlikely contribution of the co-integration analysis to economics seems to be modi-fications of economic theories in regard to the selection of explanatory variablesas shown in Johansen and Juselius (1992) and McNown and Wallace (1992). Asystematic inference procedure is desired for the selection. My earlier commenton McNown and Wallace (1992) is just a hint.

Page 260: Hatanaka econometry

Appendix 1Spectral Analysis

This appendix supplies proofs for several well-known statements on the spectralanalysis that appear in the text of Chapters 3 and 7. Mathematical rigour issacrificed for emphasis on the basic ideas. The readers who want more aboutthe spectral analysis are advised to study Fuller (1976, ch. 7), Granger andNewbold (1986, ch. 2), and Hamilton (1994, ch. 6). More advanced readingsare Anderson (1971, chs. 7, 8, 9, and 10) and Brillinger (1981).

(a) In a sequence of real numbers that extends in both directions,..., y~i, Yo,Yi, • •• suppose that Yj — Y-j, J — 1.2,

The expression (A1.2) is called the Fourier inverse transform of /(A.).Given a stationary stochastic process, {x,}, such that E(x,) = 0, the auto-

covariance for lag j is E(x,x,-j) — E(xl+jXt), and thus Yj = Y-j- The Fouriertransform of the sequence, . . . , y-i, y0, y\, • • • is the (power) spectral density

is the Fourier transform of the sequence. In (Al.l) i = J — 1. Noting thatexp(-/Aj) = cos A.7 - j'sinAj, cos A. 7 = cosA(-y'), sin A. 7 = — sinA.(-7),CA 1.11 is written as

(Al.l')

and also as

When /(A.) is defined by (Al.l) we have for ; = . . . , -1, 0, 1 , . . .

where the second equality follows from

(A 1.2)

(Al.l")

Page 261: Hatanaka econometry

248 Appendix 1

function, /(A.), where the argument A, is called the frequency. The Fourier inversetransform of /(A.) is the autocovariance sequence.

(b) Given two sequences, ..., a_i, KQ, ct\,... and ..., y~\, Xo> Yi. • • • we areinterested in the sequence, . . . , a _ i y _ j , aoXo. « iXi , • • • •Define

Then

where we have used that /^ exp(-/£(; - /i) )d£ = 0 if j ^ h, and =2rt if ; = h.

(c) If {;c,} is a scalar stationary process with E(x,) = 0 and the spectral densityfunction /(A.), there is the Cramer representation

where Z(-) is a complex valued, random function such that

The spectral density function of {yt} is

(A1.3)

Consider Then

Page 262: Hatanaka econometry

Wiener Process 249

Setting a0 = 1, a\ — —I, all other as to zero, the spectral density function ofAx, is

Appendix 2Wiener (Brownian Motion)

Process

Suppose that the pointer on your computer display is initially at the origin ofX-axis, and begins to move either left or right by a non-stochastic amount, Ax,every At second. Probabilities of moving left and right are each 1/2, and direc-tions in successive movements are mutually independent. Let *,• be a randomvariable defined by

if the /th movement is to the rightif the ith movement is to the left,

and let [a] be the largest integer that does not exceed a. The position of thepointer t seconds after the initial movement is

Since £(*,-) = 0 and varfe) = \,E(X(t\) = 0 and var(X(f)) = (Ax)2(t/At].Let c be a positive real number, and let Ax -> 0 and A? -> 0 while keepingA* — cVAf. In the limit we have a continuous time, and

Moreover, as Ax ->• 0, [t/At] ->• oo, and X(t) is a sum of mutually independent,infinitely many (Ajc)x,. By virtue of the central-limit theorem X(t) is distributedin the normal distribution with the first- and the second-order moments givenabove. Finally, since the x-t s over two non-overlapping time periods are indepen-dent, we have the same independence property in the continuous time obtainedby Ax -> 0.

This is the motivation behind the Wiener process, or the Brownian motionprocess, defined on a continuous time t > 0 by the following conditions,

(i)

(ii)

(iii)

for all

and are independent

Page 263: Hatanaka econometry

250 Appendix 2

When c = 1, w(t) is called the standard Wiener process. In the present bookw(-) is the standard Wiener process, and the one with c =£ 1 will be writtencw(t). Thus w(l) is N(0, 1). Note that if s < t

In general cov(w(*), w( f ) ) = min(i, t). A good, elementary explanation of theWiener process is found in Ross (1983) and Karlin (1975), the former beingmore elementary than the latter.

Suppose that a scalar process {e,} is i.i.d. and E(et) = 0 and E(e2) — a2, andlet ST = £i + • • • + ST. Whatever the distribution of er may be, T~l/2a~lSr isdistributed asymptotically as T -> oo in N(0, 1) = w(l). For a real number rsuch that 1 > r > 0 let [7>] be the largest integer not exceeding Tr, and denote

S[7>] is ei + . . . + s, for some t and called a partial sum of (e\, s2,...). GivenT, XT(r) may be regarded a stochastic process defined on a continuous time, r,such that 1 > r > 0. As T -*• oo we have the following theorems.

(i)

(ii)(iii) (ii) holds true even when m ->• oo.

(iii) (ii) holds true even when m ->• oo.

These theorems are called Donsker's theorem, and may be regarded as an exten-sion of the central-limit theorem. It is an extension because (i), (ii), and (iii) arconcerned with a random function of r instead of a single random variable.

Recall a well-known theorem to the effect that (i) if for a stochastic process

[XT] XT —> x as T -> oo and (ii) if /(•) is a continuous mapping, then /(XT) —>f ( x ) . An analogous theorem holds on a functional of the stochastic processXT(r). As T -> oo

For example, XT(r)2 —> w(r)2. Proofs of the above theorems (i)-(iv) requirean advanced level of mathematics. A relatively elementary one is in Chung(1974: 217-22). More standard but also more advanced references areBillingsley (1968), Hall and Heyde (1980), Pollard (1984), and Tanaka (19956).

The above (i), (ii), (iii), and (iv) are called the functional central-limit theoremand also the invariance principle. The Wiener process here is defined over [0, 1].

The convergence in distribution, ->, should in fact be the weak convergence,

but I shall write ->• throughout the present book because the convergence in

Page 264: Hatanaka econometry

Theories involving a Linear Trend 251

distribution and the weak convergence are identical in so far as they are appliedin econometrics. Descriptions of Xj-(r) more detailed than the above are foundin Banerjee et al. (1993: 21-5) and Hamilton (1994: 477-86), and they areuseful to understand visually the convergence involved in (i) above.

On the {er} introduced above let v,Then

(A3.1)

Extending (ii) and (iii) it can be shown that

(A2.1)

(A2.2)

The main task here is to investigate

Appendix 3Asymptotic Theories involving a

Linear Deterministic Trend

The present appendix explains the asymptotic theories that are used in Chap-ters 4 and 5 to deal with deterministic trends.

A3.1 Asymptotic Normality Involving a Linear Trend

Suppose that

Xt = (i + Pt + £,,

where {e,} is i.i.d. with E(s,) = 0 and £(e2) = cr2. Let <! = (1, . . . ,I) , / ' = ( l , . . . , r ) ,e ' = to..., fir), *' = (xi,...,xT),X = [i,t],DT =diagU1/2, T3/2]. Also let ff s (/}, /}) be the OLS estimator of 0' = (/A, /6).We analyse DT(9 - 6) = (D^lX'XDjl)~lD^X'e. It is easy to see that asT -)• oo

Page 265: Hatanaka econometry

252 Appendix 3

It is well known that T~l/2"^st ->• N(0,o2). Readers might wonder aboutr~3/2 J] ts, since the weights on e,s increase with t. It can be shown that thevector random variable in (A3.1) jointly converges to a multivariate normaldistribution.

To prove this we must show for any non-stochastic (ai,a2) thata\T~l!2^2et + c^T"3/2 ^2ts, converges to a scalar normal variate. This termmay be written £} c,Tst, where

As T -» oo, Y^ cir is O(l), and the maximum of c2r over t = I, ..., T goes to

zero. Therefore as T —> oo

and the last term goes to zero as 1 —>• oo due to (A3.2) and existence or thesecond-order moment, E(s2). I have followed Anderson (1971: 24).

The normal variate to which (A3.1) converges has the covariance matrix,a2limDf lX'XDjl = o^Q, and the conclusion is th

(A3.2)

This condition establishes the Lindeberg-Feller condition, which is that for any<5(>0)

(A3.3)

Here /(•) is the p.d.f., not necessarily normal, of e( common to all t and S\ —Y,c2

T. The reason why (A3.3) follows from (A3.2) is as follows.

Notice that ft is 73/2-consistent.The essential point in the above derivation is that Y^\ t2 = O(T3) while t2 is

at most T2, from which it follows that a weight of one observation in the wholeset is made as small as we wish by taking T sufficiently large. An exponentialgrowth violates this condition.

(A3.4)

Page 266: Hatanaka econometry

Theories involving a Linear Trend

A3.2 Proofs of (4.17)-(4.19)

253

Regarding the model

f & = O, p) is £ = (^'XT1*'*, where <' = (!,..., 1),x'_l = (xi,... ,XT-\), x' = (x2,... ,xT),X = (i, jc_0 • {e,} is i.i.d. with zeromean. We derive the limiting distribution of 0 when the true value of 6 is0°' = (/u0, 1), where fj° ̂ 0. {x,} is generated by

wherejcIt is seen that

Let

Then it follows from the above calculations that limr^oof 7- 1X'XD = E.T1

I now turn to DflX'e, where e' = ( e a , . . . , fir)- Noting that

converges in probability to zero, it is seen that

As just shown in Section A3.1

Page 267: Hatanaka econometry

254

Therefore

Appendix 3

In particular, T3>2(p -1)4- N(0, 12of//°2)-

A3.3 OLS along Equation (5.3)1

The least-squares estimator in (5.3) plays an important role in testing for Dagainst TS. The OLS is to be run along

(A3.5)(A3.5)

The data matrix of regressors in (A3.5) is X = [ i , t , X - \ ] , where x_i =(*!,..., XT-\)', and the data vector of the regressand is x = (x2,..., XT)Let

(A3.6)

A3.3.1 Difference stationary DGP

Initially the data-generating process is

(A3.7)

where v, is = Y?s=i e* anc* (e<) is i-i-^- witri E(£t) = 0 and E(e2) = a2.. Iassumed that KO = 0, but note that 6*0 and XQ cannot be distinguished. It will beseen later that neither is involved in ft, p, and the t- and F-statistics.

Let v = (v2 , . . . , v r ) ' , v_ i = (v i , . . . , v r_ i ) ' , e = ( e2 ) . . . , e r ) ' ,( e i , . . . , er-i)'- From (A3.7) it follows that

(A3.T)

and from (A3.6) we get

(A3.8)

Let

I have benefited from a comment by Hiro Toda on an earlier draft of the present section.

ThenXA = [ i , t , v_i]. Let

residual,

Page 268: Hatanaka econometry

Theories involving a Linear Trend 255

Then [i,t,v^]BT = [i,i,v^i] and t'i = 0. Writing ABT == AT, we have

(A3.9)

and

where f = r - (r + 2)/2. Note that (A3.9) is nothing but (A3.8).Introduce a lemma,

where A = b — a\ — a2,, which incidentally can be extended to a higher dimen-sionality. (A3.9) is equal to

where

(A3.10a)

(A3.106)

where

Page 269: Hatanaka econometry

which proves that plimp = 1, plim/8 = 0, plim/t — 9\.The last equation of (A3.11) by itself can be derived as follows. Let Z = [i, i],

and Qz = / -Z(Z'Z)~1Z'. Then T'(p - 1) = (T'^x'^Q.x^r^'^x'^Q,(*-x-i) = (r-2v:1fizv_,r1r-1vi,ez«.

The limiting distribution of cr^2 A is given in (6.3) in the text, and the limitingdistribution of T(p - 1) is given in (6.8a). From (A3.ll) it is obvious thatT(p — 1) is invariant to whether 0\ = or / 0 in the data-generating process.However, 0 does depend upon 0, = or / 0. It is Op(T~l) if 0\ / 0, butOpCr-^if 0, =0.

(i) Multicolinearity between /3 and p.Since 7"-3£7e, and T'^^Jv,^ are both O^T1-3/2), (A3.ll) also shows

that, if 9\ / 0, 6*1 (p — 1) + /8 = op(T~l), which means that /S and p are perfectlycorrelated in Op(T~l). The correlation is negative (positive) if G\ > (<) 0. Theperfect correlation is caused by the following relation between two regressors,t and x,-\: (a) xt-\ = (Oo — 0\) + 9\t + v,-\, and as T -> oo Q\t dominates v r_iin so far as 0\ ^ 0; (b) according to (A3.5) the dominating part of xt is thus(B\p + P)t. Subtracting x,~\ from both sides of (A3.5) it is seen that 6 \ ( p — \ ) + pis the coefficient in the effect of ( upon A.x>, which is expected to be zero.

(»') F-test statistic.Let e = x -Xy. Then e'e = x'[I -X(X'X)-lX']x. From (A3.7) it follows

that

We have

256 Appendix 3

Thus from (A3.8) and (A3.9)

(A3.11)

Thus

Then let e be the residual vector in the constrained least squares with the

Page 270: Hatanaka econometry

We can also prove that plim(r - 4) le'e = a2. The limiting distribution of theF-statistic is given in (6.8c) of the text.

A3.3.2 The trend stationary DGP

Let us turn to the case where the data-generating process is

where {u,} is a stationary process with zero mean and the sequence of theserial correlation coefficients, TO = 1, r\, ..., and the variance, a2. Let u' =(HI, .. • , UT). Then

Concerning y in (A3.6),

where A, Br, and hence AT are defined in the same way as above, but here D-j-is diag [r1/2, (r3/12)1/2, T'1/2}. Therefore

Thus

Therefore plimp = r1; plim/3 = 0\(\ — r\), and plim/l = 9o(l — r-\) + Q\r\. IfQI ^ 0, 9\ (p — 1) + ft = op(T~l), i.e. p and /3 are perfectly correlated just in thesame way as in the difference-stationary case. This is relevant to some findingsin the Bayesian discrimination between the trend and the difference stationarityas noted in Chapter 10 of the text.

Theories involving a Linear Trend 257

, andconstraint, (ft, p) - (0, 1), in (A3.5). Then

(A3.12)

Page 271: Hatanaka econometry

Appendix 4OLS Estimator of

Difference-StationaryAutoregressive Process

A4.1 Proof of (6.27)

The limiting result, (6.27), is due to Fuller (1976: 373-7). Here I shall show aproof based on the Wiener process rather than his original. The DGP is (6.23')with a® = 1, i.e. (6.19). Let (a\,..., ap) be the OLS of (a\,..., ap) along(6.23')- Let X be the data matrix of regressors, (x,-\, A x r _ i , . . . , A^,_p+i); ebe the vector of e,; and DT = diag|T, Tl/2,..., T1/2], which is p x p. We areconcerned with T(a} - 1), which is the first element of

Remaining elements converge in probability to the autocovariances of {u,}.Because of (A4.3) DflX'XDfl is asymptotically block diagonal, and the (1,1)element is a block by itself.

In the DGP A*, = u,, and {M,} is constructed by AR(p - 1), (6.20'). Let usinvert this AR(p — 1) into an MA, and write

with the same {E,} as in (6.20'). Then an expression of cr2 alternative to (6.26)is

(A4.4)

We have been concerned with the first element of (A4.1). In view of theasymptotic block diagonality of (DflX'XDfl)~l we are now interested in thefirst element of Df1X'e, which is 71"1 ^jc,_ie,. Applying (2.8) to AJC, = u, we

258

where a is the long-run variance of «,, i.e. (6.26). For (1, j + 1) elementsj = 1 , . . . , p - 1, we have from (6.14)

(A4.3)

Regarding the (1,1) element of DX'XD we see from (6.13) that

(A4.2)

(A4.1)

Page 272: Hatanaka econometry

Autoregressive Process 259

see that

(A4.5)

Combining (A4.2) and (A4.5), and noting (A4.4), it is seen that

Note that b ( l ) = (1 — a\ — ... — ap^\) l > 0. The distribution of oti involvesnuisance parameters (a\,..., ap_\).

As regards the f-statistic,

since a — aeb(\) and plim* = ae. Notice that b(\) has disappeared from t, andthe limiting distribution of ? is free from the nuisance parameters.

It has been seen that Df1X'XDfl is asymptotically block-diagonal. This kindof block diagonality will be observed frequently in the unit-root field betweenthe non-stationary and the stationary variables.

A4.2 Other Statistics

If ai = 1, «' = («2 , . . . , otp) is asymptotically identical to the OLS estimator of(«!,. . . , ap-\) in (6.20'). Using the well-known limiting distribution of the latter(see, e.g. Anderson (1971: 164-200)) the limiting distribution of a is given by

The second term on the right-hand side converges to zero in probability, and byvirtue of (4.7)

where

and

Page 273: Hatanaka econometry

260

Appendix 5Mathematics for the VAR, VMA,

and VARMAA bivariate AR may be written, for example, as either

(A5.1)

Econometricans are more familiar with (A5.1), but the system theory adopts(A5.2). I shall give an elementary, non rigorous explanation of the system theoryused in Section 12.2.

I distinguish the words, polynomials, and infinite power series, by restrictingthe former to finite orders. The matrix on the left-hand side of (A5.2) has apolynomial of lag operator L in each element. More generally we consider amatrix of which each element is a polynomial function of a scalar argumentz. It will be denoted by A(z). The coefficients of polynomials are assumedto be real numbers. Such a matrix is called the polynomial matrix. When apolynomial matrix is factored below, the factor polynomial matrix must alsohave coefficients in real numbers.

The determinant of a square polynomial matrix, A(z), is defined in the sameway as in the ordinary matrix. The determinant will be denoted by det[A(z)].A(z) is said to be non-singular if det[A(z)] = 0 holds only at a finite number ofreal or complex values of z, i.e. det[A(z)] is not identically zero. A(z) is singularif and only if there exists a polynomial vector b(z)' such that b(z)'A(z) = 0.

The fc-variate VAR (vector autoregressive process) is represented by

(A5.3)

A(z) is assumed to be non-singular. We are concerned with the stationarity of[xt] generated by (A5.3) with an i.i.d. {et}. (A5.3) is a linear difference equationfor {xt} with {e,} as a forcing function. The stationarity of {x,} is equivalent tothe stability of the difference equation just in the same way as in the univariateAR. And the stability of (A5.3) is that the roots of the equation for z

(A5.4)

are all larger than unity in moduli, i.e. all lie outside the unit circle of thecomplex plane. If the characteristic polynomial of the difference equation iswritten as in mathematical economics, the stability would be represented by the

(A5.2)

or

Page 274: Hatanaka econometry

VAR, VMA, VARMA 261

characteristic roots being less than unity in moduli as we learn in economicdynamics (see Section 2.3).

In the present appendix and Section 12.2 we frequently refer to the roots of determinental equation such as (A5.4) lying all outside of the unit circle of thecomplex plane. In every such statement I simply write that det[A(z)] = 0 hasall roots outside the unit circle, omitting the words such as the equation and thecomplex plane.

Let B(z) be a k x k polynomial matrix. Then the fc-variate VMA (vectormoving average process) is represented by

(A5.5)

We are concerned with the invertibility of VMA. To explain the invertibility Irevert to an expression such as (A5.1). B(z) may be written BQ+B\Z+. . .+Bqz

q,where the Bs are each k x k matrices. B(z) is said to be invertible if one canfind a unique sequence of matrices, AQ, A\,..., such that

holds identically and 5Z°10 ||Aj \\ converges in terms of some measure of thenorm || || of As. A necessary and sufficient condition for the invertibility of B(z)is that roots of

(A5.6)

all lie outside the unit circle. This generalizes the well-known invertibility condi-tion for a univariate MA.

VARMA is written as

(A5.7)

Just as a (scalar) polynomial may be factored into two or more (scalar)polynomials, A(z) may be factored into two or more polynomial matrices, and sois B(z). If A(z) and B(z) have a factor polynomial matrix common between them,it should be cancelled in (A5.7). When all the common factors are cancelledbetween A(z) and B(z), A(z)~lB(z) is called the irreducible MFD (matrixfraction description). The irreducibility is assumed throughout the followingdescription.

A matrix with a rational function of z in each element is called the rationalmatrix. When A(z) is non-singular C(z) = A(z)~lB(z) is a rational matrix. Thedeterminant of a rational matrix is also defined in the same way as in the ordinarymatrix, and the concept of non-singularity follows from it in the same way asin the polynomial matrix. In fact,

where A(z) is the adjoint (adjunct) of A(z) and hence is a polynomialmatrix. C(z) s A(z)B(z) is also a polynomial matrix. Since det[C(z)] =

1 See Kailath (1980: 367) for a more accurate description of the irreducibility.

Page 275: Hatanaka econometry

262 Appendix 5

(det[A(z)])~kdet[C (z)], the non-singularity of C(z) is equivalent to the non-singularity of C (z).2

I assume that the k x k C(z) = A(z)"[B(z) is non-singular in (A5.7), whichimplies that B(z) is non-singular. The assumption precludes linear dependencyamong elements of {xt} in (A5.7). When C(z) is non-singular, it can be repre-sented in the Smith-McMillan form with the following properties.

C(z) = £/(z)A(z)V(z), (A5.8)

where (i) three matrices on the right-hand side are each k x k;(ii) U(z) and V(z) are polynomial matrices, and det[l/(z)] and

det[V(z)] are both non-zero constants not involving z',(iii) A(z) = diag[A. j (z) , . . . , A.t(z)] and X,-(z) = /,-(z)/g,-(z),

i = I , . . . , k such that(iiia) f i ( z ) and g,-(z) are polynomials not sharing a common factor;(iiii) /,-(z)|/,+i(z), i = !,...,*-!;(iiic) £;+i(z)|&(z), z = 1 , . . . , k - 1;

(iv) A(z) is uniquely determined by C(z), but f/(z) and V(z) arenot.

Concerning (iiib) and (iiic) above, a(z)|fc(z) means that a(z) divides b(z), i.e. a(z)is a factor of fe(z). See Kailath (1980: 443-4) for the Smith-McMillan form.

The polynomial matrix of which the determinant is a non-zero constant (i.e.a polynomial in zero degree) is called unimodular. Any products of unimodularmatrices are unimodular, and the inverse of a unimodular matrix is unimodular.The reason why unimodular matrices appear in (A5.8) is that elementary rowoperations and column operations are unimodular. The elementary row oper-ations consist of (i) an interchange of two rows, (ii) addition to a row of apolynomial multiple of another row, and (iii) scaling all elements of a row ba non-zero constant. The elementary column operations are likewise defined.

For example, consider

Then

To express C(z) in a Smith-McMillan form it is convenient to rewrite

2 The roots are not necessarily identical. If the highest power of z in det[A(z)] exceeds that ofevery element of C(z), det[C(± oo)] = 0, while detfC(± oo)] jt 0.

(A5.9)

(A5.10)

Page 276: Hatanaka econometry

VAR, VMA, VARMA 263

in what is called the Smith form. A polynomial matrix with rank 2 can berepresented in a Smith form,

(A5.ll)

Here both V\ (z) and V\ (z) are 2 x 2 and unimodular and A i (z) =diag[A.i(z), A.2(z)J where A.i(z)|A.2(z). How to obtain a Smith form is explainedin Kailath (1980: 375-6, 391). It consists of elementary row and columnoperations. (A5.10) is equal to (A5.ll) with

Then the Smith-McMillan form of (A5.9) is (A5.8) with V(z) = t/i(z), V(z) =V,(z), and

V (z) ' represents an elementary column operation, and the four matrices thatform U(z) are each obtained by inverting some matrices that represent elemen-tary row operations.

The above result has a number of implications. First, for any finite values of zin general and for a root of /,(z) = 0, ZQ, in particular I/(zo) and V(zo) are non-singular so that the rank of C(zo) is equal to the rank of A(ZO). (There shouldbe no contradiction between C(z) being non-singular and C(zo) being singular.)Second, if zo is not a root of fk-r(z) — 0 nor of g\(z) = 0 but is a root ofk-r+i(z) = 0, then z0 is a root of /i_r+2(z) = 0 , . . . , fk(z) - 0, and A(zo)has rank, k — r, which is also the rank of C(z()). As a third implication 1 quote alemma from Hannan and Deistler (1988: 54). Roots of /, (z) = 0 , . . . , fk(z) - 0are roots of det[B(z)] — 0, and roots of g\ (z) = 0 , . . . , gk(z) — 0 are roots ofdet[A(z)] = 0.

For any matrix A its rank will be denoted by p(A) below.In Section 12.2 we consider (A5.7), especially the case where det[A(z)J =

has all roots lying outside the unit circle and det[B (z)] = 0 has real unit rootsas well as other roots lying outside the unit circle. It follows from the thirdimplication mentioned above that, for some (k — r) such that 0 < k — r <k - 1, /t_ r + 1(l) = /*-r+2(l) = . . . = /t(l) = 0 while /,(!), . . . , /*_ r(l) arnot zero. (The latter is void if k — r = 0.) From the assumption about A(z) wehave gi(l) ^ 0, . . . , #*(!) ^ 0, and it follows that p(A(l)) — k — r, which is

Page 277: Hatanaka econometry

264 Appendix 5

also equal to p(C(l)) as stated in the first implication. Since C(l) — A(l)~lB(l)and A(l) is non-singular by the assumption about A(z), p ( B ( l ) ) is also k — r.

The conclusions are that C(l) and B ( l ) have the identical rank, which is equalto the number of non-zero terms among / i ( l ) , . . . , /*(!).

Incidentally, if det[B(z)] = 0 has all roots outside the unit circle and none ofthe real unit root, than p(B(\)) — k.

Reverting to the case where det[J?(z)] = 0 has real unit roots, Engle and Yoo(1991) suggest the following rearrangement of (A5.8). Write

(A5.12)

where //(z) = 0 has all roots lying outside the unit circle. This is possible byvirtue of the third implication and the assumption about B(z). Moreover, let ussuppose for the exposition in Section 12.2 that all the mjS in (A5.12) are unity.Let

(A5.13)

Then

(A5.14)

Setting ) can be rewritten

(A5.15)

In regard to U(z)~ = Dg(z)U(zT , since det[U(zr ] = 0 has no (finite) roots,3

and since det[A(z)] = 0 and hence det[Dg(z)] — 0 have all roots outside the unitcircle, det[£7 (z)J"1 = 0 has all roots outside the unit circle. Moreover, sincedet[t/(z)] is a constant not involving z, U(z)~l is a polynomial (rather thanrational) matrix. V(z) is a polynomial matrix, and det[V(z)] — 0 has all rootsoutside the unit circle. V (L) is invertible, but D(L) is non-invertible. Therefore(A5.15) is VARMA with non-invertible MA, which (A5.7) is. The differencebetween (A5.7) and (A5.15) is that the presence of unit root is made morevisible in (A5.15).

3 In general, if A(z) is a non-singular k x k polynomial matrix and B(z) is its adjoint matrix,then det[B(z)] = (det[A (z)] )k~'. The roots of del[B(z)] = 0 must also be roots of dct[A(z)] = 0.Concerning the text, det[l/(?)] = 0 has no roots because U(z.) is unimodular. Let l/*(z) be the adjointof U(z). Then det[l/*(z)] = 0 has no finite roots. Though del[U(z)~l] = 0 may have z = ± oo as aroot, it is outside the unit circle.

Page 278: Hatanaka econometry

265

Appendix 6Fully Modified Least-Squares

Estimator

In Section 13.3 we have introduced a mixed Gaussian estimator proposed inPhillips (1991a) on the co-integrated regression, in which {(Ax,, Ay()} are serial-ly correlated. On the same model Phillips and Hansen (1990) proposed anothermixed Gaussian estimator called the fully modified least squares.

Initially the notations in Section 13.3 will be reproduced, {u,} is a stationaryprocess with zero mean vector, u't = (« i ( ,«2r)> and E(u,u't+j) = Fj, F =5Dfc-oo r1./' ar|d -^ — Sjli F). The (i, j ) element of F, F/,, and A are respec-

tively Yij, Yij , and A.,;. The additional notation is A = FQ + A, of which the(i, j) element is A,^. Note that neither A nor A is symmetric in general but F is.The model is x, — Y^'s=\ Ml" an(^ v< = ^x' + M2r- We estimate b from {(xt, y,)}.

The fully modified least-squares estimator of b begins with a consistentestimation of (y\\, 721) and (An, A^), but I like to delay description of theestimation method. The estimators are denoted by (y\i,Y2i) and ( A n , A i 2 )respectively. Construct

(A6.1)

Then the fully modified least-squares estimator of b is

It is easy to derive the limiting distribution of T(b+ — b). Since

let us investigate T l x (the nominator of the right-hand side of (A6.3)). Letw(r) = (wj(r), W2(r))' be the Wiener process with covariance matrix F. Thenusing (13.46) in the text it is seen that

(A6.2)

(A6.3)

(A6.4)

Page 279: Hatanaka econometry

266 Appendix 6

Therefore in (A6.3)

where w2.i(r) is obtained from w(r) as in (13.la) and (13.1fo). See also Section13.1.3. Finally we obtain

just as in (13.18) of the text. This is mixed Gaussian because (w\(r), w2.i(r))has a diagonal covariance matrix, diag[yn, Y22 — xli/il']-

Modifications of the OLS have been made in two points to obtain b+. Thefirst is introduction of (An , A^)' in the nominator of (A6.2). It contributes tocancelling (An, A12)' that appears in (A6.4). The second is the modification ofy, to get >V+ in (A6.1). It contributes to eliminating the linear effect of wi(r)from W2(r) as seen from (A6.3) and (A6.5).

Both x, and y, may be extended to vector processes.Now I return to the estimation of (y\\,Y2\) and (An, A^)- Phillips and

Hansen (1990) propose to estimate these parameters from OLS residuals non-parametrically in the same way as the spectral density function is estimated. TheF is the matrix spectral density function of ut = (u\t, uit)' at zero frequency.The case where a norm of the matrix spectral density function is lowest atzero frequency in its neighbourhood is a multivariate extension of the SchwertARMA discussed in Sections 7.2.2-3. The non-parametric estimation of F wouldbe extremely difficult in such cases. Phillips and Hansen (1990) estimate A inthe same way as F. Thus if the Bartlett window is used, J^ utu't+k, k = 0, 1,. . .are weighted with weights given in (3.4). To the best of my knowledge, notmuch is known about its finite sample properties.

The fully modified least-squares estimator is asymptotically equivalent to theestimator in (13.53) of Section 13.3. Users have to choose between the twoestimators. A difficulty with the latter is how to choose the lag and forward ordersin (13.53). On the other hand the fully modified least-squares method faces aproblem regarding how to choose the truncation point in the non-parametricestimation of F and A. In the simulation study conducted by Hargreaves (1993)the fully modified least-squares method fares better than the one in Section 13.3.As I see it, however, the study does not include the multivariate extension of theSchwert ARMA explained in Section 7.2.2. I expect that the method in Section13.3 is more robust to the Schwert ARMA than the fully modified least squares.This is based on the ground that the augmented Dickey-Fuller test is morerobust to the Schwert MA in the univariate analysis. An experimental study isrequired. Inder (1993) is partly relevant to the present problem.

(A6.6)

(A6.5)

Page 280: Hatanaka econometry

Fully Modified Least-Squares 267

In the univariate analysis of US macroeconomic data presented in Chapter 9I have found none indicative of the univariate Schwert ARMA in the post-warquarterly data.

Phillips (1993) applies the idea of fully modified least squares to the estimationof a possibly co-integrated VAR, and obtains results that are remarkable in termsof the asymptotic theory.

Page 281: Hatanaka econometry

This page intentionally left blank

Page 282: Hatanaka econometry

References

Agiakloglou, C., and Newbold, P. (1992), 'Empirical Evidence on Dickey-Fuller-TypeTests,' Journal of Time Series Analysis, 13: 471-83.

Ahmed, Shagheil, Ickes, Barry W., Wang, Ping, and Yoo, Byung Sam (1993), 'Interna-tional Business Cycles', American Economic Review, 83: 335-59.

Ahn, Sung K. (1993), 'Some Tests for Unit Roots in Autoregressive-Integrated-MovingAverage Models with Deterministic Trends', Biometrica, 80: 855-68.

and Reinsel, Gregory C. (1988), 'Nested Reduced-Rank Auto Regressive Modelsfor Multiple Time Series', Journal of the American Statistical Association, 83: 849-56.

(1990), 'Estimation of Partially Non-stationary Multivariate AutoregressiveModel,' Journal of the American Statistical Association, 85: 813-23.

Akaike, Hirotsugu (1973), 'Information Theory and an Extension of the LikelihoodPrinciple', in B. N. Petrov and F. Csaki (eds.), Proceedings of the Second InternationalSymposium of Information Theory, Akademia Kiado, 267-81.

Alogoskoufis, George, and Smith, Ron (1991), 'On Error Correction Models: Specifica-tion, Interpretation, Estimation', Journal of Economic Surveys, 5: 97-128.

Anderson, T. W. (1951), 'Estimating Linear Restrictions on Regression Coefficients forMultivariate Normal Distributions', Annals of Mathematical Statistics, 22: 327-51.

(1971), The Statistical Analysis of Time Series, Wiley, New York,and Kunitomo, Naoto (1992), 'Tests of Overidentification and Predeterminedness

in Simultaneous Equation Models,' Journal of Econometrics, 54: 49-78.- (1994), 'Asymptotic Robustness of Tests of Overidentification and Predeter-

minedness', Journal of Econometrics, 62: 383-414.Ansley, Craig F. (1979), 'An Algorithm for the Exact Likelihood of a Mixed

Autoregressive-Moving Average Process', Biometrika, 66: 59-65.Aoki, Masanao (1968), 'Control of Large Scale Dynamic Systems by Aggregation,' IEEE

Transactions on Automatic Control, AC-13: 246-53.- (1990), State Space Modeling of Time Series, 2nd rev. edn., Springer-Verlag, Berlin.

Ardeni, Pier Giorgio, and Lubian, Diego (1991), 'Is There Trend Reversion in PurchasingPower Parity', European Economic Review, 35: 1035-55.

Baillie, Richard T., and Bollerslev, Tim (1989), 'Common Stochastic Trends in a Systemof Exchange Rates', Journal of Finance, 44: 167-81.

(1994), 'Cointegration, Fractional Cointegration, and Exchange RateDynamics,' Journal of Finance, 69: 737-45.

and Selover, David (1987), 'Cointegration and Models of Exchange Rate Deter-mination', International Journal of Forecasting, 3: 43-51.

Balke, Nathan S., and Fomby, Thomas B. (1991), 'Shifting Trends, Segmented Trends,and Infrequent Permanent Shocks', Journal of Monetary Economics, 28: 61-85.

Banerjee, Anindya, Dolado, Juan, Galbraith, John W., and Hendry, David F. (1993), Co-integration, Error-Correction, and the Econometric Analysis of Non-stationary Data,Oxford University Press.

Page 283: Hatanaka econometry

270 References

Banerjee, Anindya, Lumsdaine, Robin L., and Stock, James H. (1992), 'Recursive andSequential Tests of the Unit-Root and Trend-Break Hypotheses: Theory and Interna-tional Evidence', Journal of Business and Economic Statistics, 10: 271-87.

Basawa, I. V., Mallik, A. K., Mccormick, W. P., Reeves, J. H., and Taylor R. L. (1991a),'Bootstrapping Unstable First Order Autoregressive Processes', Annals of Statistics, 19:1098-111.

(1991ft), 'Bootstrap Test of Significance and Sequential Boot-strap Estimation for Unstable First Order Autoregressive Processes', Communicationsin Statistics, Theory and Methods, 20: 1015-26.

Beaudry, Paul, and Koop, Gary (1993), 'Do Recessions Permanently Change Output?',Journal of Monetary Economics, 31: 149-63.

Beaulieu, J. Joseph, and Miron, Jeffrey A. (1993), 'Seasonal Unit Roots in AggregateUS Data', Journal of Econometrics, 55: 305-28.

Berger, James O. (1985), Statistical Decision Theory and Bayesian Analysis, 2nd edn.,Springer-Verlag, Berlin.

and Delampady, Mohan (1987), 'Testing Precise Hypotheses', Statistical Science,2: 317-52.

and Sellke, Thomas (1987), 'Testing a Point Null Hypothesis: The Irreconcil-ability of P Values and Evidence', Journal of the American Statistical Association,82: 112-22.

Bernanke, Ben S. (1986), 'Alternative Explanations of the Money-Income Correlation',Carnegie Rochester Conference Series on Public Policy, 25: 49-100.

Beveridge, Stephen, and Nelson, Charles R. (1981), 'A New Approach to Decompositionof Economic Time Series into Permanent and Transitory Components with ParticularAttention to Measurement of the "Business Cycle" ', Journal of Monetary Economics,7: 151-74.

Bewley, R. A. (1979), 'The Direct Estimation of the Equilibrium Response in a LinearDynamic Model', Economics Letters, 3: 357-61.

Billingsley, Patrick (1968), Convergence of Probability Measures, John Wiley, New York.Blanchard, Oliver Jean (1989), 'A Traditional Interpretation of Macroeconomic Fluc-

tuations', American Economic Review, 79: 1146-64.and Quah, Danny (1989), 'The Dynamic Effects of Aggregate Demand and Supply

Disturbances', American Economic Review, 79: 655-73.Bollerslev, Tim, and Engle, Robert F. (1993), 'Common Persistence in Conditional Vari-

ances', Econometrica, 61: 167-86.Boswijk, H. Peter (1994), 'Testing for an Unstable Root in Conditional and Structural

Error Correction Models', Journal of Econometrics, 63: 37-60.Box, G. E. P., and Jenkins, G. M. (1976), Time Series Analysis: Forecasting and Control,

2nd edn., Holden-Day, San Francisco.and Tiao, G. C. (1977), 'A Canonical Analysis of Multiple Time Series', Biometrika,

64: 355-65.Brillinger, David R. (1981), Time Series: Data Analysis and Theory, 2nd edn., Holden-

Day, San Francisco.Brodin, P. A., and Nymoen, R. (1992), 'Wealth Effects and Exogeneity: The Norwegian

Consumption Function 1966(1)-1989(4)', Oxford Bulletin of Economics and Statistics,54:431-54.

Campbell, John Y. (1987), 'Does Saving Anticipate Declining Labor Income? An Altern-ative Test of the Permanent Income Hypothesis', Econometrica, 55: 1249-273.

Page 284: Hatanaka econometry

References 271

Campbell, John J., and Deaton, Angus (1989), 'Why is Consumption so Smooth?', Reviewof Economic Studies, 56: 357-74.

and Mankiw, N. Gregory (1987), 'Are Output Fluctuations Transitory?', QuarterlyJournal of Economics, 102: 857-80.

(1989), 'International Evidence on the Persistence of Economic Fluctuations',Journal of Monetary Economics, 23: 319-33.

and Perron, Pierre (1991), 'Pitfalls and Opportunities: What Macroeconomistsshould know about Unit Roots', Macroeconomics Annual, 1991, NBER, 141-201.

and Shiller, Robert J. (1987), 'Cointegration and Tests of Present Value Models',Journal of Political Economy, 95: 1062-88, repr. in R. F. Engle and C. W. J. Granger(eds.), Long-Run Economic Relationships, Reading in Cointegration, Oxford UniversityPress.

(1988), 'Interpreting Cointegrated Models', Journal of Economic Dynamicsand Control, 12: 505-22.

Canarella, Giorgio, Pollard, Stephen K., and Lai, Kon S. (1990), 'Cointegration betweenExchange Rates and Relative Prices: Another View', European Economic Review, 14:1303-22.

Casellea, George, and Berger, Roger L. (1987), 'Reconciling Bayesian and Frequen-tist Evidence in the One-Sided Testing Problem', Journal of the American StatisticalAssociation, 82: 106-11.

Chan, N. H., and Wei, C. Z. (1988), 'Limiting Distributions of Least Squares Estimatesof Unstable Autoregressive Processes', Annals of Statistics, 16: 367-401.

Chao, John C., and Phillips, Peter C. B. (1994), 'Bayesian Model Selection in PartiallyNonstationary Vector Autoregressive Processes with Reduced Rank Structure', mimeo.

Cheung, Yin-Wong (1993), 'Long Memory in Foreign Exchange Rates', Journal of Busi-ness and Economics Statistics, 11: 93-101.

Choi, In (1992), 'Durbin-Hausman Tests for a Unit Root', Oxford Bulletin of Economicsand Statistics, 54: 289-304.

(1993), 'Asymptotic Normality of the Least-Squares Estimates for Higher OrderAutoregressive Integrated Processes with some Applications', Econometric Theory, 9:263-82.

(1994), 'Spurious Regressions and Residual-Based Tests for Cointegration whenRegressors are Cointegrated', Journal of Econometrics, 60: 313-20.

Christiano, Lawrence J., and Eichenbaum, Martin (1990), 'Unit Roots in Real GNP: DoWe know, and Do We Care?', Carnegie-Rochester Conference Series on Public Policy,32: 7-62.

Chung, Kai Lai (1974), A Course in Probability Theory, Academic Press, New York.Clark, Peter K. (1987), 'The Cyclical Component of US Economic Activity', Quarterly

Journal of Economics, 102: 797-814.(1988), 'Nearly Redundant Parameters and Measure of Persistence in Economic

Time Series', Journal of Economic Dynamics and Control, 12: 447-61.(1989), 'Trend Reversion in Real Output and Unemployment', Journal of Econo-

metrics, 40: 15-32.Clements, Michael P., and Mizon, Graham E. (1991), 'Empirical Analysis of Macroeco-

nomic Time Series', European Economic Review, 35: 887-932.Cochrance, John H. (1988), 'How Big is the Random Walk in GNP?', Journal of Political

Economy, 96: 893-920.

Page 285: Hatanaka econometry

272 References

Cogley, Timothy (1990), 'International Evidence on the Size of the Random Walk inOutput', Journal of Political Economy, 98: 501-18.

Copeland, Laurence S. (1991), 'Cointegration Tests with Daily Exchange Rate Data',Oxford Bulletin of Economics and Statistics, 53: 185-98.

Cox, D. R. (1961), Tests of Separate Families of Hypotheses', Proceedings of the FourthBerkeley Symposium on Mathematical Statistics and Probability, I, University of Cali-fornia Press, 105-23.

and Hinkley, D. V. (1974), Theoretical Statistics, Chapman & Hall, London.Davidson, James (1991), 'The Cointegration Properties of Vector Autoregression

Models', Journal of Time Series Analysis, 12: 41-62.(1994), 'Identifying Cointegrating Regressions by the Rank Condition', Oxford

Bulletin of Economics and Statistics, 56: 105-9.Hendry, David P., Srba, Frank, and Yeo, Stephen (1978), 'Econometric Modelling

of the Aggregate Time Series Relationship between the Consumers' Expenditure andIncome in the United Kingdom', Economic Journal, 88: 661-92.

Deaton, Angus (1987), 'Life-Cycle Models of Consumption: Is the Evidence Consistentwith the Theory', in T. F. Bewley (ed.), Advances in Econometrics, the 5th WorldCongress, II, Cambridge University Press, 121-48.

DeGroot, Morris H. (1973), 'Doing What Comes Naturally: Interpreting a Tail Area asa Posterior Probability or as a Likelihood Ratio', Journal of the American StatisticalAssociation, 68: 966-9.

DeJong, David N. (1992), 'Co-integration and Trend-Stationarity in Macroeconomic TimeSeries', Journal of Econometrics, 52: 347-70.

Nankervis, John C., Savin, N. E., and Whiteman, Charles H. (1992a), 'The PowerProblems of Unit Root Tests in Time Series with Autoregressive Errors', Journal ofEconometrics, 53: 323-43.

(1992/j), 'Integration versus Trend-Stationarity in Time Series', Econometrica, 60:423-33.

and Whiteman, Charles H. (1991o), 'The Temporal Stability of Dividends andStock Prices: Evidence from the Likelihood Function', American Economic Review,81: 600-17.

(1991&), 'Reconsidering "Trends and Random Walks in Macroeconomic TimeSeries"', Journal of Monetary Economics, 28: 221-54.

Delong, J. Bradford, and Summers, Lawrence H. (1988), 'How does MacroeconomicPolicy affect Output?', Brookings Papers on Economic Activity, 1988, No. 2: 433-80.

Demery, D., and Duck, N. W. (1992), 'Are Economic Fluctuations really Persistent? AReinterpretation of Some International Evidence', Economic Journal, 102: 1094-101.

Diba, Behzad T., and Grossman, Herschel I. (1988), 'Explosive Rational Bubbles inStock Prices?', American Economic Review, 78: 520-30.

Dickey, David A., and Fuller, Wayne A. (1979), 'Distribution of the Estimators forAutoregressive Time Series with a Unit Root', Journal of the American StatisticalAssociation, 74: 427-31.

(1981), 'Likelihood Ratio Statistics for Autoregressive Time Series with aUnit Root', Econometrica, 49: 1057-72.

Diebold, Francis X. (1988), Empirical Modelling of Exchange Rate Dynamics, LectureNotes in Economics and Mathematical Systems, No. 303, Springer-Verlag, Berlin.

(1993), 'Discussion, the Effect of Seasonal Adjustment Filters on Tests for a UnitRoot', Journal of Econometrics, 55: 99-103.

Page 286: Hatanaka econometry

References 273

Diebold, Francis X., Gardeazabel, Javier, and Yilmaz, Kamil (1994), 'On Cointegrationand Exchange Rate Dynamics', Journal of Finance, 69: 727-35.

Husted, Steven, and Rush, Mark (1991), 'Real Exchange Rates under the GoldStandard', Journal of Political Economy, 99: 1252-71.

Dolado, Juan, Galbraith, John W., and Banerjee, Anindya (1991), 'EstimatingIntertemporal Quadratic Adjustment Cost Models with Integrated Series', InternationalEconomic Review, 32: 919-36.

Drobny, A., and Hall, S. G. (1989), 'An Investigation of the Long-Run Properties ofAggregate Non-Durable Consumers' Expenditure in the United Kingdom', EconomicJournal, 99: 454-60.

Durbin, J., and Watson, G. S. (1971), 'Testing for Serial Correlation in Least SquaresRegression, III', Biometrika, 58: 1-19.

Durlauf, Steven N. (1989), 'Output Persistence, Economic Structure, and the Choice ofStabilization Policy', Brookings Papers on Economic Activity, 1989/2: 69-116.

and Phillips, Peter C. B. (1988), 'Trends versus Random Walks in Time SeriesAnalysis', Econometrica, 56: 1333-54.

Dweyer, Gerald P. Jr., and Wallace, Myles S. (1992), 'Cointegration and Market Effi-ciency', Journal of International Money and Finance, 11: 318-27.

Elliott, Graham, Rothenberg, Thomas J., and Stock, James H. (1992), 'Efficient Testsfor an Autoregressive Unit Root', National Bureau of Economic Research, TechnicalWorking Paper No. 130.

Engle, Robert F., and Granger, C. W. J. (1987), 'Co-integration and Error Correction:Representation, Estimation, and Testing', Econometrica, 55: 251-76, repr. in R. F.Engle and C. W. J. Granger (eds.), Long-Run Economic Relationships, Readings inCointegration, Oxford University Press.

and Hallman, J. J. (1989), 'Merging Short- and Long-Run Forecasts: AnApplication of Seasonal Cointegration to Monthly Electricity Sales Forecasting', Journalof Econometrics, 40: 45-62, repr. in R. F. Engle and C. W. J. Granger (eds.), Long-RunEconomic Relationships, Readings in Cointegration, Oxford University Press.

Hendry, David F., and Richard, Jean-Francois (1983), 'Exogeneity', Econometrica,51: 277-304.

and Yoo, Byung Sam (1987), 'Forecasting and Testing in Co-integrated Systems',Journal of Econometrics, 35: 143-59, repr. in R. F. Engle and C. W. J. Granger (eds.),Long-Run Economic Relationships, Readings in Cointegration, Oxford UniversityPress.

(1991), 'Cointegrated Economic Time Series: An Overview with NewResults', in R. F. Engle and C. W. J. Granger (eds.), Long-Run Economic Relationships,Readings in Cointegration, Oxford University Press, 237-66.

Engsted, Tom (1993), 'Cointegration and Cagan's Model of Hyperinflation under RationalExpectations', Journal of Money, Credit, and Banking, 25: 350-60.

Ericsson, Neil R. (1992), 'Cointegration, Exogeneity, and Policy Analysis: An Overview',Journal of Policy Modeling, 14: 251-80.

Fama, Eugene F., and French, Kenneth R. (1988), 'Permanent and Temporary Compo-nents of Stock Prices', Journal of Political Economy, 96: 246-73.

Ferguson, Thomas S. (1967), Mathematical Statistics: A Decision Theoretic Approach,Academic Press, New York.

Flavin, Marjorie A. (1981), 'The Adjustment of Consumption to Changing Expectationsabout Future Income', Journal of Political Economy, 89: 974-1009.

Page 287: Hatanaka econometry

274 References

Friedman, Benjamin M., and Kuttner, Kenneth N. (1992), 'Money, Income, Prices, and

Interest Rates', American Economic Review, 82: 472-92.

(1993), 'Another Look at the Evidence on Money-Income Causality', Journal

of Econometrics, 57: 189-203.Froot, Kenneth A., and Obstfeld, Maurice (1991), 'Intrinsic Bubbles: The Case of Stock

Prices', American Economic Review, 81: 1189-214.

Fukushige, M., Hatanaka, M., and Koto, Y. (1994), 'Testing for the Stationarity and the

Stability of Equilibrium", in C. A. Sims (ed.), Advances in Econometrics, Sixth World

Congress, i, Cambridge University Press, 3-45.

Fuller, Wayne A. (1976), Introduction to Statistical Time Series, John Wiley, New York.

Gali, Jordi (1992), 'How well does the IS-LM Model fit Postwar U.S. Data', Quarterly

Journal of Economics, 107: 709-38.Geweke, John (1984), 'Inference and Causality in Economic Time Series Models', in Z.

Griliches and M. D. Intriligator (eds.), Handbook of Econometrics, II, North-Holland,

Amsterdam.

Ghysels, Eric (1994), 'On the Economics and Econometrics of Seasonally', in C. A.

Sims (ed.), Advances in Econometrics, Sixth World Congress, Cambridge University

Press, 257-316.

and Perron, Pierre (1993), 'The Effect of Seasonal Adjustment Filters on Tests for

a Unit Root', Journal of Econometrics, 55: 57-98.Gonzalo, Jesus, and Granger, Clive W. J. (1991), 'Estimation of Common Long-Memory

Components in Cointegrated Systems', mimeo.

Granger, C. W. J. (1969), 'Investigating Causal Relations by Econometric Models and

Cross-Spectral Methods', Econometrica, 37: 424-38.

(1981), 'Some Properties of Time Series Data and Their Use in Econometric Model

Specification', Journal of Econometrics, 16: 121-30.

(1986), 'Developments in the Study of Cointegrated Economic Variables', Oxford

Bulletin of Economics and Statistics, 48: 213-28, repr. in R. F. Engle and C. W. J.

Granger (eds.), Long-Run Economic Relationships, Readings in Cointegration, Oxford

University Press.

(1988a), 'Models that Generate Trends', Journal of Time Series Analysis, 9: 329-43.

(1988/7), 'Some Recent Developments in a Concept of Causality', Journal of Econo-

metrics, 39: 199-211.(1991), 'Some Recent Generalizations of Cointegration and the Analysis of Long-

run Relationships', in R. F. Engle and C. W. J. Granger (eds.), Long-Run EconomicRelationships, Readings in Cointegration, Oxford University Press, 277-87.

(1993), 'What are we learning about the Long-Run?' Economic Journal, 103:

307-17.

and Hallman, Jeff (1991), 'Long Memory Series with Attractors', Oxford Bulletinof Economics and Statistics, 53: 11-26.

and Joyeux, Roselyne (1980), 'An Introduction to Long-Memory Time SeriesModels and Fractional Differencing', Journal of Time Series Analysis, 1: 15-29.

Page 288: Hatanaka econometry

References 275

Granger, C. W. J., and Lee, Hahn S. (1991), 'An Introduction to Time-Varying ParameterCointegration', in P. Hackl and A. H. Westlund (eds.), Econometric Structural Change,

Springer-Verlag, 139-57.and Lee, T.-H. (1989), 'Investigation of Production, Sales and Inventory

Relationships Using Multicointegration and Non-Symmetric Error Correction Models,Journal of Applied Econometrics, 4: 145-59.

(1990), 'Multicointegration', in G. F. Rhodes and T. B. Fomby (eds.),Advances in Econometrics, 8: 71-84, repr. in R. F. Engle and C. W. F. Granger (eds.),

Long-Run Economic Relationships, Readings in Cointegration, Oxford UniversityPress.

and Lin, Jin-Lung (1993), 'Causality in the Long Run', Econometric Theory, forth-coming.

and Newbold, P. (1974), 'Spurious Regressions in Econometrics', Journal of Econo-metrics, 2: 111-20.

(1986), Forecasting Economic Time Series, 2nd edn., Academic Press, NewYork.

and Terasvirta, T. (1993), Modelling Nonlinear Economic Relationships, OxfordUniversity Press.

and Weiss, A. A. (1983), 'Time Series Analysis of Error-Correction Models', inS. Karlin, T. Amemiya, and L. Goodman (eds.), Studies in Econometrics, Time Series,and Multivariate Statistics, Academic Press, New York.

Gregory, Allan W., Pagan, Adrian R., and Smith, Gregor W. (1993), 'Estimating LinearQuadratic Models with Integrated Processes', in P. C. B. Phillips (ed.), Models,Methods, and Applications of Econometrics: Essays in Honour of A. R. Bergstrom,Blackwell, Oxford, 220-39.

Grenander, Ulf, and Rosenflat, Murray (1957), Statistical Analysis of Stationary TimeSeries, Wiley, New York.

Hafer, R. W., and Jansen, Dennis W. (1991), 'The Demand for Money in the UnitedStates: Evidence from Cointegration Tests', Journal of Money, Credit, and Banking,23: 155-68.

Hakkio, Craig S. (1986), 'Does the Exchange Rate follow a Random Walk', Journal ofInternational Money and Finance, 5: 221 -9.

and Rush, Mark (1989), 'Market Efficiency and Cointegration and Application tothe Sterling and Deutschmark Exchange Markets', Journal of International Money andFinance, 8: 75-88.

Haldrup, Niels (1994), 'The Asymptotics of Single-Equation Cointegration Regressionswith /(I) and 7(2) Variables', Journal of Econometrics, 63, 153-81.

Hall, Alastair (1989), 'Testing for a Unit Root in the Presence of Moving Average Errors',Biometrika, 76: 49-56.

(1992a), 'Joint Hypothesis Tests for a Random Walk based on Instrumental VariableEstimators', Journal of Time Series Analysis, 13: 29-45.

(19926), 'Testing for a Unit Root in Time Series using Instrumental Variable Estima-tors with Pretest Data Based Model Selection', Journal of Econometrics, 54: 223-50.

Page 289: Hatanaka econometry

276 References

Hall, Anthony D., Anderson, Heather M., and Granger, Clive W. J. (1992), 'A Coin-tegration Analysis of Treasury Bill Rates', Review of Economics and Statistics, 75:116-26.

Hall, P., and Heyde, C. C. (1980), Martingale Limit Theory and its Applications,Academic Press, New York.

Hall, Robert E. (1978), 'Stochastic Implications of the Life Cycle Permanent IncomeHypothesis: Theory and Evidence', Journal of Political Economy, 86: 971-87.

Hall, S. G. (1986), 'An Application of the Granger and Engle Two-Step EstimationProcedure to United Kingdom Aggregate Wage Data', Oxford Bulletin of Economicsand Statistics, 48: 229-39.

Hamilton, James D. (1994), Time Series Analysis, Princeton University Press.Han, Hsiang-Ling, and Ogaki, Masao (1991), 'Consumption, Income, and Cointegration,

Further Analysis', mimeo.Hannan, E. J., and Deistler, Manfred (1988), The Statistical Theory of Linear Systems,

John Wiley, New York.Hansen, Bruce E. (1992a), 'Efficient Estimation and Testing of Cointegrating Vectors in

the Presence of Deterministic Trends', Journal of Econometrics, 53: 87-121.(\992h), 'Heteroskedastic Cointegration', Journal of Econometrics, 54: 139-58.

Hargreaves, Colin (1993), 'A Review of Methods of Estimating Cointegrating Relation-ships', mimeo.

Harris, David, and Inder, Brett (1992), 'A Test of the Null Hypothesis of Cointegration',mimeo.

Harrison, J. Michael, and Kreps, David M. (1979), 'Martingales and Arbitrage in Multi-period Securities Market', Journal of Economic Theory, 20: 381-408.

Harvey, A. C. (1985), 'Trends and Cycles in Macroeconomic Time Series', Journal ofBusiness and Economic Statistics, 3: 216-27.

(1989), Forecasting Structural Time Series Models and the Kalman Filter,Cambridge University Press.

(1990), The Econometric Analysis of Time Series, 2nd edn., MIT Press, Cambridge,Mass.

Hatanaka, Michio, and Koto, Yasuji (1994), 'Are There Unit Roots in Real EconomicVariables? (An Encompassing Analysis of Difference and Trend Stationarity)', mimeo.

and Odaki, Mitsuhiro (1983), 'Policy Analyses with and without A Priori Condi-tions', Economic Studies Quarterly, 34: 193-210.

Hayashi, Fumio (1982), 'The Permanent Income Hypothesis: Estimation and Testing byInstrumental Variables', Journal of Political Economy, 90: 895-916.

Hendry, D. F. (1979), 'Predictive Failure and Econometric Modelling in Macroeco-nomics: The Transactions Demand for Money', in P. Ormerod (ed.), Modelling theEconomy, Heinemann, London, 217-42.

(1987), 'Econometric Methodology: A Personal Perspective', in T. F. Bewley(ed.), Advances in Econometrics Fifth World Congress, II, Cambridge University Press,29-48.

(1993), Econometrics, Alchemy or Science?, Blackwell, Oxford.

Page 290: Hatanaka econometry

References 277

Hendry, D. F., and Mizon, Grayham E. (1978), 'Serial Correlation as a ConvenientSimplification, Not a Nuisance: A Comment on a Study of the Demand for Money by

the Bank of England', Economic Journal, 88: 549-63.

(1993), 'Evaluating Dynamic Econometric Models by Encompassing theVAR', in P. C. B. Phillips (ed.), Models, Methods, and Applications of Econometrics:Essays in Honour of A. R. Bergstrom, Blackwell, Oxford, 272-300.

and Neale, Adrian J. (1991), 'A Monte Carlo Study of the Effects of StructuralBreaks on Tests for Unit Roots', in P. Hackl and A. H. Westlund (eds.), EconomicStructural Change, Springer-Verlag, Berlin, 95-119.

Pagan, Adrian R., and Sargan, J. Dennis (1984), 'Dynamic Specification', in Z.Griliches and M. D. Intriligator (eds.), Handbook of Econometrics, II, North-Holland,Amsterdam.

and Richard, Jean-Francois (1982), 'On the Formulation of Empirical Model inDynamic Econometrics', Journal of Econometrics, 20: 3-33.

and von Ungern-Sternberg, Thomas (1980), 'Liquidity and Inflation Effects onConsumers' Expenditure', in A. Deaton (ed.), Essays in the Theory and Measurementof Consumers' Behaviour, Cambridge University Press, 237-60.

Hoffman, Dennis L., and Rasche, Robert H. (1991), 'Long-Run Income and Interest Elas-ticities of Money Demand in the United States', Review of Economics and Statistics,74: 665-74.

Hosking, J. R. M. (1981), 'Fraction Differencing', Biometrika, 68: 165-76.Huizinga, John (1987), 'An Empirical Investigation of the Long-Run Behavior of

Real Exchange Rates', Carnegie-Rochester Conference Series on Public Policy, 27:149-214.

Hunter, John (1992), 'Tests of Cointegrating Exogeneity for PPP and Uncovered InterestRate Parity in the United Kingdom', Journal of Policy Modeling, 14: 453-63.

Hylleberg, Svend (1994), 'The Economics of Seasonal Cycles: A Comment', in C. A.Sims (ed.), Advances in Econometrics, Sixth World Congress, i, Cambridge UniversityPress, 252-5.

Engle, R. F., Granger, C. W. J., and Yoo, B. S. (1990), 'Seasonal Integration andCointegration', Journal of Econometrics, 44: 215-38.

and Mizon, Grayham E. (1989), 'Cointegration and Error Correction Mechanisms',Economic Journal, 99 (Suppl.), 113-25.

Inder, Brett (1993), 'Estimating Long-Run Relationships in Economies', Journal ofEconometrics, 57: 53-68.

Johansen, S0ren (1988), 'Statistical Analysis of Cointegration Vectors', Journal ofEconomic Dynamics and Control, 12: 231-54.

(1991a), 'Estimation and Hypothesis Testing of Cointegration Vectors in GaussianVector Autoregressive Model', Econometrica, 59: 1551-80.

(\99\b), 'Statistical Analysis of Cointegration Vectors', in R. F. Engle and C. W. J.Granger (eds.), Long-run Economic Relationships, Readings in Cointegration, OxfordUniversity Press, 131-52.

(1992a), 'A Representation of Vector Autoregressive Processes Integrated of Order2', Econometric Theory, 8: 188-202.

(1992/?), 'Determination of Cointegration Rank in the Presence of a Linear Trend',Oxford Bulletin of Economics and Statistics, 54: 383-97.

(1992c), 'Cointegration in Partial Systems and the Efficiency of Single-EquationAnalysis', Journal of Econometrics, 52: 389-402.

Page 291: Hatanaka econometry

278 References

Johansen, S0ren (1992d), Testing Weak Exogeneity and the Order of Cointegration inUK Money Demand Data', Journal of Policy Modeling, 14: 313-34.

(1992e), 'A Statistical Analysis of Cointegration for 1(2) Variables', mimeo.(1994), 'The Role of the Constant Term in Cointegration Analysis of Nonstationary

Variables', Econometric Reviews, 13: 205-19.and Juselius, Katarina (1990), 'Maximum Likelihood Estimation and Inference

on Cointegration-with Applications to the Demand for Money', Oxford Bulletin ofEconomics and Statistics, 52: 109-210.

(1992), 'Some Structural Hypotheses in a Multivariate Cointegration Analysisof the Purchasing Power Parity and the Uncovered Interest Parity for U.K.', Journalof Econometrics, 53: 211-44.

Judge, George G., Griffiths, W. E., Hill, R. Carter, Lutkepohl, Helmut, and Lee, Tsoung-Chao (1985), The Theory and Practice of Econometrics, 2nd edn., John Wiley, NewYork.

Juselius, Katarina (1991), 'Long Run Relations in a Well Defined Statistical Model for theData Generating Process, Cointegration Analysis of the PPP and the UIP Relation',in J. Gruber (ed.), Econometric Decision Models: New Methods of Modelling andApplications, Springer-Verlag, Berlin, 336-57.

(1994), 'On the Duality between Long-Run Relations and Common Trends in7(1) versus 7(2) Model: An Application to Aggregate Money Holding', EconometricReviews, 13: 151-78.

Kailath, Thomas (1980), Linear Systems, Prentice-Hall, New York.Kariya, Takeaki (1980), 'Locally Robust Tests for Serial Correlation in Least Squares

Regression', Annals of Statistics, 8: 1065-70.Karlin, Samuel (1975), A First Course in Stochastic Processes, 2nd edn. Academic Press,

New York.Kasa, Kenneth (1992), 'Common Stochastic Trends in International Stock Markets',

Journal of Monetary Economics, 29: 95-124.Kennan, John (1979), The Estimation of Partial Adjustment Models with Rational Expec-

tations', Econometrica, 47: 1441-55.Kim, Yoonbai (1990), 'Purchasing Power Parity in the Long Run: A Cointegration

Approach', Journal of Money, Credit, and Banking, 22: 491-503.Kim, Myung Jig, Nelson, Charles R., and Startz, Richard (1991), 'Mean Reversion in

Stock Prices? A Reappraisal of the Empirical Evidence', Review of Economic Studies,58: 515-28.

King, M. L. (1980), 'Robust Tests for Spherical Symmetry and Their Application toLeast Squares Regression', Annals of Statistics, 8: 1265-71.

and Hillier, Grant H. (1985), 'Locally Best Invariant Tests of the Error CovarianceMatrix of the Linear Regression Model', Journal of the Royal Statistical Society, seriesB, 47: 98-102.

King, Robert G., Plosser, Charles I, Stock, James H., and Watson, Mark W. (1991),'Stochastic Trends and Economic Fluctuations', American Economic Review, 81:819-40.

Kirchgassner, Gebhard (1991), 'Comment on G. E. Mizon, "Modelling Relative PriceVariability and Aggregate Inflation in the United Kingdom", Scandinavian Journal ofEconomics, 93: 189-211.

Kleibergen, Frank, and Van Dijk, Herman K. (1992), 'On the Shape of the Likeli-hood/Posterior in Cointegration Models', Econometric Theory, 10: 514-51.

Page 292: Hatanaka econometry

References 279

Kleibergen, Frank, and Van Dijk, Herman K. (1994), 'Direct Cointegrating Testing inError Correction Models', Journal of Econometrics, 63: 61-103.

Kleidon, Allan W. (1986), 'Variance Bounds Tests and Stock Price Valuation Models',Journal of Political Economy, 94: 953-1001.

Konishi, Toru, and Granger, Clive W. J. (1992), 'Separation in Cointegrated Systems',mimeo.

Ramsey, Valerie A., and Granger, Clive W. J. (1993), 'Stochastic Trends and Short-Run Relationships between Financial Variables and Real Activity', NBER WorkingPaper No. 4275.

Koop, G. (1992), "Objective" Bayesian Unit Root Tests', Journal of Applied Economet-rics, 1: 65-82.

and Steel, Mark F. (1994), 'A Decision-Theoretic Analysis of the Unit-Root Hypo-thesis Using Mixtures of Elliptical Models', Journal of Business and Economic Statis-tics, 12: 95-107.

Kormendi, Roger, and Meguire, Philip (1990), 'A Multicountry Characterization ofthe Nonstationarity of Aggregate Output', Journal of Money, Credit, and Banking,22: 77-93.

Koto, Yasuji, and Hatanaka, Michio (1994), 'A Simulation Study of the P-values Discrim-ination between the Difference and Trend Stationarity', mimeo.

Kramer, Walter (1986), 'Least Squares Regression when the Independent Variable followsan ARIMA Process', Journal of the American Statistical Association, 81: 150-4.

Kremers, Jeroen J. M., Ericsson, Neil R., and Dolado, Juan J. (1992), 'The Power ofCointegration Tests', Oxford Bulletin of Economics and Statistics, 54: 325-48.

Kugler, Peter, and Lenz, Carlos (1993), 'Multivariate Cointegration Analysis and theLong-Run Validity of PPP', Review of Economics and Statistics, 75: 180-4.

Kunitomo, Naoto (1994), 'Tests of Unit Roots and Cointegration Hypotheses in Econo-metric Models', mimeo.

and Yamamoto, Taku (1990), 'Conditions on Consistency by Vector AutoregressiveModels and Cointegration', Economic Studies Quarterly, 41: 15-33.

Kunst, Robert, and Neusser, Klaus (1990), 'Cointegration in a Macroeconomic System',Journal of Applied Econometrics, 5: 351-65.

Kwiatkowski, Denis, Phillips, Peter C. B., Schmidt, Peter, and Shin, Yongcheol (1992),'Testing the Null Hypothesis of Stationarity against the Alternative of a Unit Root: HowSure are We that Economic Time Series Have a Unit Root?', Journal of Econometrics,54: 159-78.

Lamotte, Lynn Roy, and Mcwhorter Jr., Archer (1978), 'An Exact Test for the Presenceof Random Walk Coefficients in a Linear Regression Model', Journal of the AmericanStatistical Association, 73: 816-20.

Layton, Allan P., and Stark, Jonathan P. (1990), 'Cointegration as an Empirical Test ofPurchasing Power Parity', Journal of Macroeconomics, 12: 125-36.

Learner, Edward E. (1978), Specification Searches, John Wiley, New York.(1991), 'Comment on "To Criticize the Critics'", Journal of Applied Econometrics,

6: 371-3.Lee, Inpyo, and Hamada, Koichi (1991), 'International Monetary Regimes and Times Series

Properties of Macroeconomic Behavior: A Cointegration Approach', mimeo.Lee, Tae-Hwy (1992), 'Stock-Flow Relationships in US Housing Construction', Oxford

Bulletin of Economics and Statistics, 54: 419-30.

Page 293: Hatanaka econometry

280 References

Leroy, Stephen F. (1973), 'Risk Aversion and the Martingale Property of Stock Prices',International Economic Review, 14: 436-46.

Levin, Andrew, and Lin, Chien-Fu (1992), 'Unit Root Tests in Panel Data: Asymptoticand Finite-Sample Properties', mimeo.

Leybourne, S. J., and Mccabe, B. P. M. (1993), 'A Simple Test for Cointegration', OxfordBulletin of Economics and Statistics, 55: 97-103.

(1994), 'A Consistent Test for a Unit Root', Journal of Business and EconomicStatistics, 12: 157-66.

Lim, Kian-Guan, and Phoon, Kok-Fai (1991), 'Tests of Rational Bubbles using Cointe-gration Theory', Applied Financial Economics, 1: 85-7.

Lindley, D. V. (1965), Introduction to Probability and Statistics, Pt. 2, Inference,Cambridge University Press.

Lippi, Marco, and Reichlin, Lucrezia (1992), 'On Persistence of Shocks to EconomicVariables', Journal of Monetary Economics, 29: 87-93.

Lo, Andrew W. (1991), 'Long-Term Memory in Stock Market Prices', Econometrica,59: 1279-313.

and MacKinlay, A. Craig (1989), 'The Size and Power of the Variance Ratio Test inFinite Samples, A Monte Carlo Investigation', Journal of Econometrics, 40: 203-38.

Lucas, Robert F. Jr. (1978), 'Asset Prices in an Exchange Economy', Econometrica, 46:1429-45.

Liitkepohl, Helmut (1991), Introduction to Multiple Time Series Analysis, Springer-Verlag,Berlin.

and Reimers, Hans-Eggert (1992), 'Impulse Response Analysis of CointegratedSystems', Journal of Economic Dynamics and Control, 16: 53-78.

McAleer, Michael, McKenzie, C. R., and Pesaran, M. Hashem (1994), 'Cointegrationand Direct Tests of the Rational Expectations Hypothesis', Econometric Reviews, 13:231-58.

Mcdermott, C. J. (1990), 'Cointegration: Origins and Significance for Economists', NewZealand Economic Papers, 24: 1 -23.

Mckinnon, Donald I., Radcliffe, Christopher, Tan, Kong-Yam, Warga, Arthur D., andWillet, Thomas D. (1984), 'International Influences on the US Economy: Summary ofan Exchange', American Economic Review, 74: 1132-4.

MacNeill, Ian B. (1978), 'Properties of Sequences of Partial Sums of PolynomialRegression Residuals with Applications to Tests for Change of Regression at UnknownTimes', Annals of Statistics, 6: 422-33.

McNown, Robert, and Wallace, Myles S. (1989), 'National Price Levels, Purchasing PowerParity, and Cointegration', Journal of International Money and Finance, 8: 533-45.

(1992), 'Cointegration Tests of a Long-Run Relation between Money Demandand the Effective Exchange Rate', Journal of International Money and Finance, 11:107-14.

Maekawa, K., Yamamoto, T., Takeuchi, Y., and Hatanaka, M. (1993), 'Estimation inDynamic Regression with an Integrated Process", mimeo.

Mankiw, N. Gregory, and Shapiro, Matthew (1985), 'Trends, Random Walks and Testsof the Permanent Income Hypothesis', Journal of Monetary Economics, 16: 163-74.

Mark, Nelson C. (1990), 'Real and Nominal Exchange Rates in the Long Run: AnEmpirical Investigation', Journal of International Economics, 28: 115-36.

Meese, R. A., and Rogoff, K. (1988), 'Was It Real? The Exchange Rate-Interest Differ-ential Relation over the Modern Floating Rate Period', Journal of Finance, 43: 933-48.

Page 294: Hatanaka econometry

References 281

Mellander, E., Vredin, A., and Warne, A. (1992), 'Stochastic Trends and EconomicFluctuations in a Small Open Economy', Journal of Applied Econometrics, 7: 369-94.

Mills, Terence C. (1992), 'How Robust is the Finding that Innovations to UK Output arePersistent?', Scottish Journal of Political Economy, 39: 154-66.

Mizon, Grayham E. (1977), 'Model Selection Procedures', in M. J. Artis and A. R.Nobay (eds.), Studies in Modern Economic Analysis, Blackwell, Oxford, 97-120.

(1984), 'The Encompassing Approach in Econometrics', in D. F. Hendry and K. F.Walles (eds.), Econometrics and Quantitative Economics, Blackwell, Oxford, 135-72.

(1991), 'Modelling Relative Price Variability and Aggregate Inflation in the UnitedKingdom', Scandinavian Journal of Economics, 93: 189-211.

and Richard, Jean-Frangois (1986), 'The Encompassing Principle and its Applicationto Testing Non-nested Hypotheses', Econometrica, 54: 657-78.

Morimune, Kimio, and Mantini, Akihisa (1993), 'The Order of the Vector AutoregressiveProcess with Unit Roots', mimeo.

(1995), 'Estimating the Rank of Co-integration after Estimating the Order ofa Vector Autoregression', Japanese Economic Review, forthcoming.

Mosconi, Rocco, and Giannini, Carlo (1992), 'Non-Causality in Cointegrated Systems:Representation, Estimation and Testing', Oxford Bulletin of Economics and Statistics,54: 399-417.

Nabeya, Seiji, and Tanaka, Katsuto (1988), 'Asymptotic Theory of a Test for theConstancy of Regression Coefficients against the Random Walk Alternative', Annalsof Statistics, 16: 218-35.

(1990), 'A General Approach to the Limiting Distribution for Estimators in TimeSeries Regression with Nonstable Autoregressive Errors', Econometrica, 58: 145-63.

Neave, Henry R. (1972), 'Observations on "Spectral Analysis of Short Series: A Simu-lation Study" by Granger and Hughes', Journal of the Royal Statistical Society, SeriesA, 135: 393-405.

Nelson, Charles R., and Plosser, Charles I. (1982), 'Trends and Random Walks in Macro-economic Time Series', Journal of Monetary Economics, 10: 139-62.

Neusser, Klaus (1991), 'Testing the Long-Run Implications of the Neoclassical GrowthModels', Journal of Monetary Economics, 27: 3-27.

Nickell, Stephen (1985), 'Error Correction, Partial Adjustment and all That: An Exposi-tory Note', Oxford Bulletin of Economics and Statistics, 47: 119-29.

Nowak, Eugen (1991), 'Discovering Hidden Cointegration', mimeo.Nyblom, Jukka (1986), 'Testing for Deterministic Linear Trend in Time Series,' Journal

of the American Statistical Association, 81: 545-9.and Makelainen, Timo (1983), 'Comparisons of Tests for the Presence of Random

Walk Coefficients in a Simple Linear Model', Journal of the American StatisticalAssociation, 78: 856-64.

Ogaki, Masao (1992), 'Engel's Law and Cointegration', Journal of Political Economy,100: 1027-46.

and Park, Joon Y., 'A Cointegration Approach to Estimating Preference Parameters',mimeo.

Osborn, Denise R. (1990), 'A Survey of Seasonality in U.K. Macroeconomic Variables',International Journal of Forecasting, 6: 327-36.

Osterwald-Lenum, Michael (1992), 'A Note on Quantiles of the Asymptotic Distributionof the Maximum Likelihood Cointegration Rank Test Statistic', Oxford Bulletin ofEconomics and Statistics, 54: 461 -72.

Page 295: Hatanaka econometry

282 References

Ouliaris, Sam, Park, Joon Y., and Phillips, Peter C. B. (1989), 'Testing for a Unit Rootin the Presence of a Maintained Trend', in B. Raj (ed.), Advances in Econometrics andModeling, Kluwer, Kingston-upon-Thames, 7-28.

Pagan, Adrian (1985), 'Time Series Behaviour and Dynamic Specification', OxfordBulletin of Economics and Statistics, 47: 199-211.

and Wickens, M. R. (1989), 'A Survey of Some Recent Econometric Methods',Economics Journal, 99: 962-1025.

Pantula, Sastry G., and Hall, Alastair (1991), Testing for Unit Roots in AutoregressiveMoving Average Models', Journal of Econometrics, 48: 325-53.

Park, Joon Y. (1992), 'Canonical Cointegrating Regressions', Econometrica, 60: 119-43.and Phillips, Peter C. B. (1988), 'Statistical Inference in Regressions with Integrated

Processes': 1, Econometric Theory, 4: 468-97.(1989), 'Statistical Inference in Regressions with Integrated Processes': 2,

Econometric Theory, 5: 95-131.Paulsen, Jostein (1984), 'Order Determination of Multivariate Autoregressive Time Series

with Unit Roots', Journal of Time Series Analysis, 5: 115-27.Patel, Jayendu (1990), 'Purchasing Power Parity as a Long-Run Relation', Journal of

Applied Econometrics, 5: 367-79.Perron, Pierre (1988), 'Trends and Random Walks in Macroeconomic Time Series:

Further Evidence from a New Approach', Journal of Economic Dynamics and Control,12: 297-332.

(1989a), 'The Great Crash, the Oil Price Shock, and the Unit Root Hypothesis',Econometrica, 57: 1361-401.

(1989ft), 'Testing for a Random Walk: A Simulation Experiment of Power when theSampling Interval is Varied', in B. Raj (ed.), Advances in Econometrics and Modelling,Kluwer, Kingston-upon-Thames, 47-68.

(1990), 'Testing for a Unit Root in a Time Series with a Changing Mean', Journalof Business and Economic Statistics, 8: 153-62.

(1991a), Test Consistency with Varying Sampling Frequency', EconometricTheory,!: 341-68.

(1991ft), 'A Test for Changes in a Polynomial Trend Function for a Dynamic TimeSeries', mimeo.

and Vogelsang, Timothy J. (1992a), 'Nonstationarity and Level Shifts with anApplication to Purchasing Power Parity', Journal of Business and Economic Statistics,10: 301-20.

(1992i>), Testing for a Unit Root in a Time Series with a Changing Mean:Corrections and Extensions', Journal of Business and Economic Statistics, 10: 467-70.

Pesaran, M. Harshem (1987), The Limits to Rational Expectations, Blackwell, Oxford.Phillips, P. C. B. (1986), 'Understanding Spurious Regressions in Econometrics', Journal

of Econometrics, 33: 311-40.(1987), Time Series Regression with a Unit Root', Econometrica, 55: 277-301.(1988a), 'Weak Convergence of Sample Covariance Matrices to Stochastic Integrals

via Martingale Approximations', Econometric Theory, 4: 528-3.(1988ft), 'Multiple Regression with Integrated Time Series', Contemporary Math-

ematics, 80: 79-105.(1989), 'Partially Identified Econometric Models', Econometric Theory, 5: 181-240.(1990), 'Solution', Econometric Review, 6: 431-3.(1991a), 'Optimal Inference in Cointegrated Systems', Econometrica, 59: 283-306.

Page 296: Hatanaka econometry

References 283

Phillips, P. C. B. (1991ft), 'To Criticize the Critics: An Objective Bayesian Analysis ofStochastic Trends', Journal of Applied Econometrics, 6: 333-64.

(1991c), 'Bayesian Routes and Unit Roots: De Rebus Prioribus Semper EstDisputandum', Journal of Applied Econometrics, 6: 435-73.

(\99\d), 'Unidentified Components in Reduced Rank Regression Estimation ofECM's', mimeo.

(1992a), 'The Long-Run Australian Consumption Function Reexamined: An Empir-ical Experience in Baysian Inference', in C. Hargreaves (ed) Macroeconomic Modellingof the Long Run, Edward Elgar, Cheltenham, 287-322.

(1992fe), 'Bayesian Model Selection and Prediction with Empirical Applications',mimeo.

(1993), 'Fully Modified Least Squares and Vector Autoregression', mimeo.and Durlauf, S. N. (1986), 'Multiple Time Series Regression with Integrated

Processes', Review of Economic Studies, 53: 473-95.(1994), 'Some Exact Distribution Theory for Maximum Likelihood Estimators

of Cointegrating Coefficients in Error Correction Models', Econometrica, 62: 73-93.and Hansen, Bruce E. (1990), 'Statistical Inference in Instrumental Variables Regres-

sion with 1(1) Processes', Review of Economic Studies, 57: 99-125.and Loretan, Mico (1991), 'Estimating Long-Run Economic Equilibria', Review of

Economic Studies, 58: 407-36.and Ouliaris, S. (1990), 'Asymptotic Properties of Residual Based Tests for Coin-

tegration,' Econometrica, 58: 165-93.and Park, Joon Y. (1988), 'Asymptotic Equivalence of Ordinary Least Squares and

Generalized Least Squares in Regressions with Integrated Regressors,' Journal of theAmerican Statistical Association, 83: 111-15.

and Perron, Pierre (1988), 'Testing for a Unit Root in Time Series Regression',Biometrika, 75: 335-46.

and Ploberger, W. (1992), 'Posterior Odds Testing for a Unit Root with Data-BasedModel Selection', mimeo.

Poirier, Dale J. (1988), 'Frequentist and Subjectivist Perspectives on the Problem ofModel Building in Economies', Journal of Economic Perspective, 2/1: 121-44.

(1991), 'A Comment on "To Criticize the Critics: An Objective Bayesian Analysisof Stochastic Trends", Journal of Applied Econometrics, 6: 381-6.

Pollard, David (1984), Convergence of Stochastic Processes, Springer-Verlag,Berlin.

Poterba, James M., and Summers, Lawrence H. (1988), 'Mean Reversion in Stock Prices,Evidence and Implications', Journal of Financial Economics, 22: 27-59.

Pratt, John W. (1965), 'Bayesian Interpretation of Standard Inference Statements',Journal of the Royal Statistical Society, Series B, 27: 169-92.

Quah, Danny (1990), 'Permanent and Transitory Movements in Labor Income: AnExplanation for "Excess Smoothness" in Consumption', Journal of Political Economy,98: 449-75.

(1993), 'Exploiting Cross Section Variation for Unit Root Inference in DynamicData', mimeo.

Raj, Baldev (1992), 'International Evidence in Persistence in Output in the Presence ofan Episodic Change', Journal of Applied Econometrics, 7: 281-93.

Rappoport, Peter, and Reichlin, Lucrezia (1989), 'Segmented Trends and Non-StationaryTime Series', Economic Journal, 99 (Conference): 168-77.

Page 297: Hatanaka econometry

284 References

Reinsel, Gregory C., and Ahn, Sung K. (1992), 'Vector Autoregressive Models withUnit Roots and Reduced Rank Structure: Estimation, Likelihood Ratio Test, and Fore-casting', Journal ofTime Series Analysis, 13: 353-75.

Romer, Christina D. (1986a), 'Spurious Volatility in Historical Unemployment Data',Journal of Political Economy, 94: 1-37.

(1986ft), 'Is the Stabilization of the Postwar Economy a Figment of the Data?',American Economic Review, 76: 314-34.

(1989), 'The Prewar Business Cycle Reconsidered: New Estimates of GrossNational Product, 1869-1908', Journal of Political Economy, 97: 1-37.

Rose, Andrew K. (1988), 'Is the Real Interest Rate Stable?', Journal of Finance, 43:1095-112.

Ross, Sheldon M. (1983), Stochastic Processes, John Wiley, New York.Rudebusch, Glenn D. (1992), 'Trends and Random Walks in Macroeconomic Time Series:

A Re-Examination', International Economic Review, 33: 661-80.Said, Said E., and Dickey, David A. (1984), 'Testing for Unit Roots in Autoregressive-

Moving Average Models of Unknown Order', Biometrika, 71, 599-607.Saikkonen, Pentti (1991), 'Asymptotically Efficient Estimation of Cointegration

Regression', Econometric Theory, 1: 1-21.(1993), 'Estimation of Cointegration Vectors with Linear Restrictions', Econometric

Theory, 9: 19-35.and Luukkonen, Ritva (1993a), 'Testing for a Moving Average Unit Root in

Autoregressive Integrated Moving Average Models,' Journal of the American Statis-tical Association, 88: 596-601.

(1993ft), 'Point Optimal Tests for Testing the Order of Differencing in ARIMAModels', Econometric Theory, 9: 343-62.

Salmon, Mark (1982), 'Error Correction Mechanisms', Economic Journal, 92: 615-29.Sargan, J. D. (1964), 'Wages and Prices in the United Kingdom: A Study in Econo-

metric Methodology', repr. in D. F. Hendry and K. F. Wallis (eds.), Econometrics andQuantitative Economics, Blackwell, Oxford, 275-314.

Sargent, Thomas (1979), Macroeconomic Theory, Academic Press, New York.Schmidt, Peter, and Phillips, Peter C. B. (1992), 'LM Tests for a Unit Root in the Presence

of Deterministic Trends', Oxford Bulletin of Economics and Statistics, 54: 257-87.Schotman, Peter C., and Van Dijk, Herman K. (1991a), 'A Bayesian Analysis of the Unit

Root in Real Exchange Rates', Journal of Econometrics, 49: 195-238.(1991ft), 'On Bayesian Routes to Unit Roots', Journal of Applied Economet-

rics, 6: 387-401.(1993), 'Posterior Analysis of Possibly Integrated Time Series with an Appli-

cation to Real GNP', in D. Brillinger, P. Caines, J. Geweke, E. Parzen, M. Rosenblatt,and M. Taqqu (eds.), New Directions in Time Series Analysis, Springer-Verlag, 341 -63.

Schwarz, Gideon (1978), 'Estimating the Dimension of a Model', Annals of Statistics, 6:461-4.

Schwert, G. William (1987), 'Effects of Model Specification on Tests for Unit Roots inMacroeconomic Data', Journal of Monetary Economics, 20: 73-103.

(1989), 'Tests for Unit Roots: A Monte Carlo Investigation', Journal of Businessand Economic Statistics, 7: 147-59.

Shapiro, Matthew D., and Watson, Mark W. (1988), 'Sources of Business Cycle Fluctu-ations', NBER Macroeconomics Annual, 3: 111-48.

Page 298: Hatanaka econometry

References 285

Shiller, Robert J. (1981a), 'Do Stock Prices Move Too Much to Be Justified by Subse-quent Changes in Dividends?', American Economic Review, 71: 421-36.

(1981b), 'Alternative Tests of Rational Expectation Models: The Case of TermStructures', Journal of Econometrics, 16: 71-87.

Shintani, Mototsugu (1994), 'Cointegration and Tests of the Permanent IncomeHypothesis: Japanese Evidence with International Comparisons', Journal of theJapanese and International Economies, 8: 144-72.

Sims, Christopher A. (1972), 'Money, Income, and Causality', American EconomicReview, 62: 540-52.

(1980a), 'Martingale-Like Behavior of Prices', Working Paper No. 489, NBER.(19806), 'Macroeconomics and Reality', Econometrica, 48: 1-48.(1988), 'Bayesian Skepticism on Unit Root Econometrics', Journal of Economic

Dynamics and Control, 12: 463-74.(1991), 'Comment by Christopher A. Sims on "To Criticize the Critics", by Peter

C. B. Phillips', Journal of Applied Econometrics, 6: 423-34.Stock, James H., and Watson, Mark W. (1990), 'Inference in Linear Time Series

Models with Some Unit Roots', Econometrica, 58: 113-44.and Uhlig, Harald (1991), 'Understanding Unit Rooters: A Helicopter Tour', Econo-

metrica, 59: 1591-9.Solo, Victor (1984), 'The Order of Differencing in ARIMA Models', Journal of the

American Statistical Association, 79: 916-21.Spanos, Aris (1986), Statistical Foundations of Economic Modelling, Cambridge Univer-

sity Press.Stock, James H. (1987), 'Asymptotic Properties of Least Squares Estimators Cointe-

grating Vectors', Econometrica, 55: 1035-56.(1992), 'Deciding between 1(1) and 1(0)', NBER, Technical Working Paper No. 121.and Watson, Mark W. (1988a), 'Variable Trends in Economic Time Series', Journal

of Economic Perspectives, 2/3, 147-74, repr. in R. F. Engle and C. W. J. Granger(eds.), Long-Run Economic Relationships, Readings in Cointegration, Oxford Univer-sity Press.

(1988ft), 'Testing for Common Trends', Journal of the American StatisticalAssociation, 83: 1097-107, repr. in R. F. Engle and C. W. J. Granger (eds.), Long-RunEconomic Relationships, Readings in Cointegration, Oxford University Press.

(1989), 'Interpreting the Evidence on Money-Income Causality', Journal ofEconometrics, 40: 161-81.

(1993), 'A Simple Estimator of Cointegrating Vectors in Higher Order Inte-grated Systems', Econometrica, 61: 783-820.

Stock, James H. and West, Kenneth D. (1988), 'Integrated Regressors and Tests of thePermanent-Income Hypothesis', Journal of Monetary Economics, 21: 85-95.

Stultz, Rene M., and Wasserfallen, Walter (1985), 'Macroeconomic Time Series, BusinessCycles and Macroeconomic Policies', Carnegie-Rochester Conference Series on PublicPolicy, 22: 9-54.

Sugihara, Soichi (1994), 'Error Correction Representation of a Dynamic Equation Modeland its Estimation', mimeo. in Japanese.

Takeuchi, Yoshiyuki (1991), 'Trends and Structural Changes in Macroeconomic TimeSeries', Journal of the Japan Statistical Society, 21: 13-25.

Tanaka, Katsuto (1990), 'Testing for a Moving Average Unit Root', Econometric Theory,6: 433-44.

Page 299: Hatanaka econometry

286 References

Tanaka, Katsuto (1993), 'An Alternative Approach to the Asymptotic Theory of SpuriousRegression, Cointegration, and Near Cointegration', Econometric Theory, 9: 36-61.

(1995a), 'The Optimality of Extended Score Tests with Applications to Testing fora Moving Average Unit Root', in S. Maddala, P. C. B. Phillips, and T. N. Srinivasan(eds.), Advances in Econometrics, Blackwell, Oxford.

(I995b), Nonstationary and Noninvertible Time Series Analysis: A DistributionTheory, Wiley, New York, forthcoming.

Taylor, Mark P. (1988), 'An Empirical Examination of Long-Run Purchasing PowerParity using Cointegration Techniques', Applied Economics, 20: 1369-81.

Toda, Hiro Y. (1994), 'Finite Sample Properties of Likelihood Ratio Tests for Cointe-grating Ranks when Linear Trends are Present', Review of Economics and Statistics,76: 66-79.

(1995), 'Finite Sample Performance of Likelihood Ratio Tests for CointegratingRanks in Vector Autoregressions', Econometric Theory, forthcoming.

and Mckenzie, C. R. (1994), 'LM Tests for Unit Roots in the Presence of MissingObservations', mimeo.

and Phillips, P. C. B. (1993), 'Vector Autoregression and Causality', Econometrica,61: 1367-93.

(1994), 'Vector Autoregression and Causality: A Theoretical Overview andSimulation Study', Econometric Reviews, 13: 259-85.

and Yamamoto, Taku (1995), 'Statistical Inference in Vector Autoregressions withPossibly Integrated Processes', Journal of Econometrics, forthcoming.

Tsay, Ruey S. (1988), 'Outliers, Level Shifts, and Variance Changes in Time Series',Journal of Forecasting, 7: 1-20.

Tsurumi Hiroki, and Wago, Hajime (1993), 'A Bayesian Analysis of Unit Root underUnknown Order of an ARMA(p.q) Error', mimeo.

(1994), 'A Bayesian Analysis of Unit Root and Cointegration with an Appli-cation to a Yen-Dollar Exchange Rate Model', mimeo.

Wago, Hajime, and Tsurumi Hiroki (1991), 'A Bayesian Analysis of Unit Root andStationarity Hypotheses with an Application to the Exchange Rate for Yen', mimeo.

Walker, A. M. (1969), 'On the Asymptotic Behaviour of Posterior Distributions', Journalof the Royal Statistical Society, Series B, 31: 80-8.

Wallace, Myles S., and Warner, John T. (1993), The Fisher Effect and the Structure ofInterest Rates: Tests of Cointegration', Review of Economics and Statistics, 75: 320-4.

Wasserfallen, Walter (1986), 'Non-stationarities in Macro-economic Time Series: FurtherEvidence and Implications', Canadian Journal of Economics, 19: 498-510.

Watson, Mark W. (1986), 'Univariate Detrending Methods with Stochastic Trends',Journal of Monetary Economics, 18: 49-75.

West, Kenneth D. (1988), 'Asymptotic Normality When Regressors Have a Unit Root',Econometrica, 56: 1397-417.

White, John S. (1958), The Limiting Distribution of the Serial Correlation Coefficientin the Explosive Case', Annals of Mathematical Statistics, 29: 1188-97.

Wickens, Michael R. (1993), 'Rational Expectations and Integrated Variables', in P. C. B.Phillips (ed.), Models, Methods, and Applications of Econometrics: Essays in Honourof A. R. Bergstrom, Blackwell, Oxford, 317-36.

Wooldridge, Jeffrey M. (1991), 'Notes on Regression with Difference-Stationary Data',mimeo.

Page 300: Hatanaka econometry

References 287

Yamamoto, Taku (1988), Time Series Analysis in Economics, Sobunsha, Tokyo (inJapanese).

Zellner, Arnold (1971), An Introduction to Bayesian Inference in Econometrics, JohnWiley, New York.

(1986), 'On Assessing Prior Distributions and Bayesian Regression Analysis with g-Prior Distributions', in P. Goel and A. Zellner (eds.), Bayesian Inference and DecisionTechniques, 233-43.

Zivot, Eric, and Andrews, Donald W. (1992), 'Further Evidence on the Great Crash,the Oil-Price Shock, and Unit-Root Hypothesis', Journal of Business and EconomicStatistics, 10: 251-70.

Page 301: Hatanaka econometry

This page intentionally left blank

Page 302: Hatanaka econometry

Subject Index

adjustment matrix 142Akaike information criterion 227

balanced equation 165balanced growth version 153Bartlett window 31-2Bayesian inference 102-4

comparison with classical inference105-7, 111-12

likelihood function 107-8prior distribution 108-9

bubbles 201

central trend line 5, 91Cochrane Orcutt transformation 189, 194,

208co-integration:

direction of 196in error correction form 142in MA representation 136-7as a statistical model 150in VAR representation 141-2when 1(0) variables are involved 135,

137, 160-1co-integration rank 137

determation of 221-7co-integrated regression 168-9

with a correlated error 172-4, 190-2as a special case of VAR 195-7, 232,

238-9with an uncorrelated error 170-2,

188-90co-integration space 137

testable properties of 231-5co-integrating vector 137

a priori specified 198-9common trends 138, 147-8, 242consumption function 11, 194, 201, 245CPI 23, 99-100

deterministic trends 74in co-integration 139-40, 149, 225-6,

239in co-integrated regression 178-87in discrimination between DS and TS

16, 91-2in limit distributions 51-3in unit root tests 47-8, 50

Dickey-Fuller test 48-9, 53, 56-7, 63,251-4in encompassing analysis 82

difference stationarity (DS) 7, 16discrimination between DS and TS 17, 24,

92-101as a model selection 77-8, 82-9

dominating root 22, 99duality 146dynamic equation 205

with single-step method 215-17with two-step method 209-11

efficient market hypothesis 9, 159encompassing analysis 79-80

by P-values 81-2, 85-9equilibrium 142, 157-60

errors in 148, 158stability of 148, 157

error-correction model 13, 142as an economic model 151-2see also Hendry model

exchange rate 10, 201, 245explosive root 23, 104

fully modified OLS 55, 265-7

general-to-specific principle 57, 78Granger non-causality 129-30

in co-integrated regression 191-2on common trends 148on equilibrium error 158in the long run 238-9in nonstationary processes 134, 149-50on restoring equilibrium 158-60in stationary processes 130-4testing 240-1

Granger representation theorem 140-6

Hendry model 206with single-step method 211-14with two-step method 207-8

historical data 25, 70, 93-5

identification:co-integration space 138, 231—5, 242lack of, on co-integration matrix 137,

147, 224, 236lack of, on level changes 76, 79in simultaneous equations 121, 124

impulse-response function 5, 123, 236innovation 6instrumental variables 58-9, 64—5integration order 17, 23

Page 303: Hatanaka econometry

290 Subject Index

Johansen method 221-6, 232-5applications 241-6

lag orders 57-8, 227-8locally best invariance (LBI) test 34-5LBIU (locally best invariance unbiased)

test 35location of non-singular submatrix 195-6,

232, 237-9logarithmic transformation 27long-run component 18-21, 136-7long-run covariance matrix 138, 166

in limit distributions 166-7, 187-8non-parametric estimation 266

long-run parameter 160, 206-7, 209-10,211-14

long-run relations 12-13, 137, 151, 160-1,198

long-run variance 20-1, 57, 189in limit distributions 54non-parametric estimation 68-9

MA unit root 19, 20, 21, 36test of 35-9, 59-61

martingale process 6, 159A test 83-4mixed Gaussian 170monetary relations 243-6

near unit rot 15non-linearity 15non-nested hypothesis 24non-parametric variance ratio

28-30non-stationarity in variance 77

order in probability (Op) 40

partial adjustment 152, 160point null hypothesis 109, 12polynomial matrix 140, 260-4post-war data 25, 70, 95-8power of test 27, 68, 227PPP 198-9, 201, 243prediction 4, 18productivity 12, 74, 241present value model 11, 158

random walk 7, 41, 249-50rational expectations 9, 204-5, 210-11real business cycle theory 12, 74, 241-3real consumption 95real GNP 70, 92, 98real rate of interest 70, 94, 98, 201real wage 70, 94, 98reduced rank regression 219-20residual based test 200

Saikkonen-Luukkonen test 36-9, 59-62in encompassing analysis 83

Schwarz information criterion 227Schwert ARMA 65, 86, 98seasonal adjustment 70seasonal unit root 15separability between DS and TS 24, 85, 89serial correlations 54, 187-8simulaneous equations model 120-1, 174-7Smith form 263Smith-McMillan form 262spectral density function 30-3, 68-9, 247-9spectral window 30-1, 68-9spurious regression 168-9, 197, 207stochastic trend 4

explanation by economic theories 10testable implications 11

stock prices 7-8, 70, 98, 245structural changes 30, 75-7, 92, 230, 243,

245structural VAR 123-4

tabulating limit distributions 67time aggregation 25-7trend stationarity (TS) 7, 16

U1P 243unemployment 70, 94, 95-8

VAR model 121-2

wage 201, 245weak exogenety 124-8,210-11,218,

237-40West model 186Wiener process 41, 249-51

Page 304: Hatanaka econometry

Author Index

Agiakloglou, C. 46, 68, 269Ahmed, S. 124, 269Ahn, S. K. 49, 225, 269, 284Akaike, H. 66, 227, 269Alogoskoufis, G. 151, 269Anderson, H. M. 199, 245, 276Anderson, T. W. 31, 37 n., 78, 219, 220,

247, 252, 259, 269Andrews, D. W. 76, 78 n., 287Ansley, C. F. 64, 269Aoki, M. 13 n., 269Ardeni, P. 198, 269

Baillie, R. T. 70, 159, 201, 245, 269Balke, N. S. 77, 269Banerjee, A. 17, 27, 30, 41, 76, 78 n.,

140 n., 142 n., 147 n., 162, 165, 169, 186,192, 204, 220, 221, 224, 240, 242 n., 251,269, 273

Basawa, I. V. 68, 270Beaudry, P. 15, 270Beaulieu, J. J. 15n., 270Berger, J. O. 103 n., 104, 109 n., l l O n . ,

I l l n., 112n., 270Berger, R. L. 107 n., 271Bernanke, B. S. 123, 124 n., 270Beveridge, S. 17, 270Bewley, R. A. 147 n., 270Billingsley, P. 250, 270Blanchard, O. J. 12, 123, 270Bollerslev, T. 70, 118 n., 159, 245, 269, 270Boswijk, H. P. 206 n., 270Box, G. E. P. 7, 12, 13 n., 57, 64,270Brillinger, D. R. 247, 270Brodin, R. A. 245, 270

Campbell, J. Y. 11 n., 13 n., 28 n., 29, 30,32, 33, 65, 122, 151, 151 n., 158, 198,270

Canarella, G. 118 n., 271Casellea, G. 107 n., 271Chan, N. H. 42 n., 58, 271Chao, J. C. 118n., 228, 271Cheung, Y. W. 15n., 271Choi, I. 38 n., 49, 58, 200, 271Christiano, L. J. 29, 66, 271Chung, K. L. 250, 271Clark, P. K. 33, 65 n., 271Clements, M. P. 23, 244, 245, 271Cochrane, J. H. 28, 29, 30, 32, 33, 271Cogley, T. 29, 30, 32, 272

Copeland, L. 159n., 245, 272Cox, D. R. 24, 34, 272

Davidson, J. 10, 12, 118 n., 151 n., 152,206, 272

Beaton, A. 11 n., 271, 272DeGroot, M. H. 107 n., 272Deisther, M. 263, 276DeJong, D. N. 63, 66, 68, 69 n., 70, 90,

108, 112, 118 n., 272Delampady, M. l l O n . , 112n., 270DeLong, J. B. 12, 25, 272Demery, D. 30, 272Diba, B. T. 23 n., 201,272Dickey, D. A. 10, 40, 46, 49, 50, 63, 64,

65, 69, 86, 87, 272, 284Diebold, F. X. 15 n., 70, 159, 243 n., 272Dolado, J. 200 n., 204, 207, 269, 273, 279Drobny, A. 201, 273Duck, N. W. 30, 272Durbin, J. 35, 273Durlauf, S. N. 12, 17, 25, 167, 169, 188,

273Dweyer, G. P. 159n., 273

Eichenbaum, M. 29, 66, 271Elliot, G. 49, 273Engle, R. F. 13, 117, 118n., 124, 128,

135, 137, 140, 147 n., 162, 264, 270, 273,277

Engsted, T. 118n., 273Ericsson, N. R. 124, 126 n., 200 n., 207,

273, 279

Fama, E. F. 28 n., 30, 273Ferguson, T. S. 34, 35, 273Flavin, M. A. '11 , 273Fomby, T. B. 77, 269French, K. R. 28 n., 30, 273Friedman, B. J. 130, 245, 274Froot, K. A. 23 n., 274Fukushige, M. 38, 58, 89 n., 107 n., 157,

274Fuller, W. A. 10, 26 n., 40 n., 42, 44, 45,

46, 49, 50, 53, 56, 63, 64, 68, 157, 199,247, 258, 272, 274

Galbraith, J. W. 204, 269, 273Gali, J. 123, 274Gardeazabal, J. 159, 243 n., 273Geweke, J. 130, 274

Page 305: Hatanaka econometry

292 Author Index

Ghysels, E. 15n., 70, 274Giannini, C. 240, 281Gonzalo, J. 148, 274Granger, C. W. J. 4, 12, 13, 15 n., 21, 64,

68, 76 n., 117, 118n., 129, 130, 131, 135,137, 140, 148, 158, 159, 162, 169, 199,235, 241 n., 245, 247, 273, 274, 276, 277,279

Gregory, A. W. 204, 275Grenander, U. 37 n., 275Griffiths, W. E. 278Grossman, H. I. 23 n., 201,272

Hafer, R. W. 244 n., 275Hakkio, C. S. 70, 159, 275Haldrup, N. 118n., 275Hall, A. 58, 59, 64, 65, 68, 275, 282Hall, A. D. 119, 245, 276Hall, P. 250, 276Hall, R. E. 10, 186, 194 n., 276Hall, S. G. 23, 201, 273, 276Hallman, J. J. 118n., 273, 274Hamada, K. 124, 279Hamilton, J. D. 4, 46, 59, 177, 196 n.,

200, 247, 251, 276Han, H. L. 151, 276Hannan, E. J. 263, 276Hansen, B. E. 118 n., 174 n., 181, 18

184, 192, 200, 265, 266, 276, 283Hargreaves, C. 119 n., 266, 276Harris, D. 229 n., 176Harrison, J. M. 10 n., 276Harvey, A. C. 25, 33, 276Hatanaka, M. 37, 38, 58, 62, 67, 73,

75, 77, 83, 89 n., 90, 93, 94 n., 95, 99,107 n., 128 n., 157, 187, 230, 274,276, 279, 280

Hayashi, F. 10 n., 276Hendry, D. F. 12, 14, 57, 76, 78, 124, 128,

152, 194, 206, 244, 269, 272, 273, 276Heyde, C. C. 250, 276Hill, R. C. 278Hiller, G. H. 35, 278Hinkley, D. V. 34, 272Hoffman, D. J. 235, 243, 277Hosking, J. R. M. 15 n., 277Huizinga, J. 28, 277Hunter, J. 221, 238, 243Husted, S. 15 n., 273Hylleberg, S. 15n., 147 n., 277

Ickes, B. W. 269Inder, B. 229 n., 266, 276, 277

Jansen, D. W. 244 n., 275Jenkins, G. M. 7, 12, 57, 67, 270Johansen, S. 23, 67, 117, 118 n., 139,

140 n., 142 n., 146, 148, 149, 159 n., 162,

165, 195, 197, 198, 199, 201, 210 n., 219,220, 221, 225, 226, 229, 230, 231, 232,233, 234, 235, 238, 239, 241, 243, 235,246, 277

Joyeux, R. 15n., 274Judge, G. G. 40, 278Juselius, K. 23, 67, 118 n., 221, 225, 231,

233, 234, 235, 238, 243, 246, 278

Kailath, T. 261 n., 262, 278Kariya, T. 35, 278Karlin, S. 250, 278Kasa, K. 245, 278Kennan, J. 118, 204, 205, 278Kim, M. J. 30, 278Kim, Y. 201, 278King, M. L. 12, 35, 278King, R. G. 123, 138, 151, 241, 242, 243,

278Kirchgasner, G. 245 n., 278Kleibergen, F. 118n., 232, 278Kleidon, A. W. 11, 279Konishi, T. 148, 235, 279Koop, G. 15, 90, 102, 111, 112, 270, 279Kormendi, R. 25, 29, 30, 279Koto, Y. 37, 38, 58, 62, 67, 73, 75, 77, 83,

89 n., 90, 93, 94 n.. 95, 99, 107 n., 157,187, 230, 274, 276, 279

Kramer, W. 171, 188, 190, 279Kremers, J. J. M. 200 n., 206, 279Kreps, D. M. 10 n., 276Kugler, P. 243 n., 279Kunitomo, N. 122, 220, 230, 235, 269,

279Kunst, R. 243, 279Kuttner, K. N. 130, 245, 274Kwiatkowski, D. 35 n., 36, 37, 60, 61,

90 n., 279

Lai, S. K. 118n., 271La Motte, L. R. 35, 279Layton, A. P. 201, 279Learner, E. E. 104, 108, 279Lee, H. S. 118 n., 275Lee, I. 124, 279Lee, T. C. 278Lee, T. H. 118n., 275, 279Lenz, C. 243 n., 279LeRoy, S. F. 10, 280Levin, A. 49, 118n., 280Leybourne, S. J. 37 n., 229 n., 280Lim, K. G. 201, 280Lin, C. F. 49, 118n., 280Lin, J. L. 241 n., 275Lindley, D. V. 103 n., 104, 107 n., 112n.,

280Lippi, M. 33 n., 280Lo, A. 15 n., 28 n., 29, 280

Page 306: Hatanaka econometry

Author Index 293

Loretan, M. 172, 191, 192, 206, 211, 212,215, 283

Lubian, D. 198, 269Lucas, R. 10, 280Lumsdaine, R. L. 30, 76, 78 n., 270Lutkepohl, H. 122 n., 236, 278, 280Luukkonen, R. 13, 28, 35 n., 36, 37, 62,

83, 284

McAleer, M. 118n., 280McCabe, B. P. M. 37 n., 229 n., 280McCormick, W. P. 270McDermott, C. J. 245, 280McKenzie, C. R. 49, 118 n., 280, 286MacKinley, A. C. 28 n., 29, 280McKinnon, D. I. 244, 280MacNeill, I. B. 61, 280McNown, R. 201, 243, 244, 246, 280McWhorter, A. 35, 279Maekawa, K. 176n., 280Makelainen, T. 13, 35, 281Mallik, A. K. 270Mankiw, A. K. 11, 28 n., 29, 30, 32, 33,

271, 280Mantani, A. 122 n., 220, 228, 281Mark, N. C. 198, 280Meese, R. A. 201, 280Meguire, P. 25, 29, 30, 279Mellander, E. 229, 236, 243, 281Mills, T. C. 76 n., 281Miron, J. A. 15 n., 270Mizon, G. E. 12, 14, 23, 74, 78, 79,

80, 147, 194, 206, 229, 244, 245,271, 277, 281

Morimune, K. 122 n., 220, 228, 281Mosconi, R. 240, 281

Nabeya, S. 13, 35, 42 n., 281Nankervis, J. C. 272Neale, A. J. 76, 277Neave, H. R. 68, 281Nelson, C. R. 5 n., 10, 12, 17, 25, 30,

48, 49, 66, 70, 90 n., 95, 109, 270, 278,281

Neusser, K. 243, 279, 281Newbold, P. 4, 12, 64, 66, 68, 130, 169,

247, 269, 275Nickel, S. 157, 281Nowak, E. 118n., 281Nyblom, J. 13, 35, 281Nymoen, R. 245, 270

Obstfeld, M. 23 n., 274Ogaki, M. 128 n., 139, 140, 151, 276,

281Osborn, D. R. 15 n., 281Osterwald-Lenum, M. 225, 229, 281Ouliaris, S. 13, 53, 167, 200, 282, 283

Pagan, A. 157, 204, 206, 275, 277, 282Pantula, S. G. 64, 65, 68, 282Park, J. Y. 14, 53, 118 n., 139, 140, 151,

167, 171, 175, 183, 188, 190, 281,282,283

Patel, J. 201, 282Paulsen, J. 227 n., 282Perron, P. 13, 14, 27, 40, 45, 47, 51, 55,

65, 67, 70, 75, 76, 76 n., 78 n., 271, 274,282, 283

Pesaran, M. H. 118 n., 157, 280, 282Phoon, K. F. 201, 280Phillips, P. C. B. 12 n., 13, 14, 17, 40, 41,

45, 49, 51, 53, 54, 55, 65, 68, 90, 102,104, 108, 117, 118 n., 123, 165, 166, 167,168 n., 169, 171, 172, 174 n., 175, 183,188, 190, 191, 192, 196 n., 197, 200, 206,211, 212, 214, 215, 220, 221 n., 228, 231,236, 240, 241, 265, 266, 267, 271, 273,279, 282, 284, 286

Ploberger, V. 102, 283Plosser, C. I. 5 n., 10, 12, 25, 48, 49, 66,

70, 90 n., 95, 109,281Poirier, D. J. 103 n., 108, 283Pollard, D. 250, 283Pollard, S. K. 118n., 271Poterba, J. M. 28, 30, 283Pratt, J. W. 107 n., 283

Quah, D. 11 n., 12, 49, 118 n., 123, 270,283

Radcliffe, C. 280Raj, B. 76, 283Ramsey, V. A. 279Rappoport, P. 76, 283Rasche, R. H. 235, 243, 277Reeves, J. H. 270Reichlin, L. 33 n., 76, 280, 283Reimers, H. G. 236, 280Reinsel, G. C. 225, 269, 284Richard, J. F. 57, 74, 79, 80, 124, 128,

206, 273, 277, 281Rogoff, K. 201, 280Romer, C. D. 25, 284Rose, A. K. 70, 284Rosenblatt, M. 37 n., 275Ross, S. M. 250, 284Rothenberg, T. J. 49, 273Rudebusch, G. D. 34, 90, 284

Said, S. E. 64, 65, 69, 86, 87, 284Saikkonen, P. 13, 28, 35 n., 36, 37, 62, 83,

191, 284Salmon, M. 152 n., 284Sargan, D. 152, 206, 277, 284Sargent, T. 4, 284Savin, N. E. 272

Page 307: Hatanaka econometry

294 Author Index

Schmidt, P. 49, 53, 68, 279, 284Schotman, P. C. 25, 90, 102, 112, 113, 284Schwarz, G. 227, 284Schwert, G. W. 65, 66, 69 n., 70, 86, 87,

284Sellke, T. I l l n., 270Selover, D. D. 201, 269Shapiro, M. 11, 12,280, 284Shiller, R. 11, 12 n., 13 n., 122, 151, 158,

198, 271, 285Shin, Y. 279Shintani, M. 200, 285Sims, C. A. 10 n., 102, 104, 107, 108, 117,

121, 124 n., 129, 130, 131, 132, 133, 175,227, 241 n., 285

Smith, G. W. 204, 275Smith, R. 151, 269Solo, V. 64, 285Spanos, A. 40, 285Srba, F. 272Stark, J. P. 201, 279Startz, R. 30, 278Steel, M. F. 102, 279Stock, J. H. 5 n., 30, 49, 76, 78 n., 102,

118 n., 130, 138, 139 n., 151, 169, 175,186, 191, 193, 227, 241 n., 270, 273, 285

Stultz, R. M. 70, 285Sugihara, S. 218 n., 285Summers, L. H. 12, 25, 28, 30, 272, 283

Takeuchi, Y. 101, 280, 285Tan, K. Y. 280Tanaka, K. 13, 35 n., 36, 42 n., 229 n.,

250, 281,285Taylor, M. P. 201, 286Taylor, R. L. 270Terasvirta, T. 15, 275Tiao, G. C. 13 n., 270Toda, H. Y. 49, 118, 220, 221, 240, 241,

245, 286Tsay, R. S. 79, 286

Tsurumi, H. 102, 108, 111, 118n., 286

Uhlig, H. 107, 108, 285

van Dijk, H. K. 25, 90, 102, 112, 113,118n., 232, 284

Vogelsang, T. J. 76, 78 n., 282von Ungern-Sternberg, T. 152, 277Vredin, A. 230, 236, 243, 281

Wago, H. 102, 108, 111, 118 n., 286Walker, A. M. 107 n., 286Wallace, M. S. 159 n., 201, 243, 244, 246,

273, 280, 286Wang, P. 269Warga, A. D. '280Warne, A. 230, 236, 243, 281Warner, J. T. 244, 286Wasserfallen, W. 70, 285Watson, G. S. 35, 273Watson, M. W. 5 n., 12, 33, 113 n.,

130, 138, 139 n., 151, 175, 191, 193, 227,241 n., 284, 285

Wei, C. Z. 42 n., 58, 271Weiss, A. A. 13, 275West, K. D. 46, 179, 186, 285, 286White, J. S. 41, 286Whiteman, C. H. 108, 112, 272Wickens, M. R. 204, 282, 286Willet, T. D. 280Wooldridge, J. M. 177, 286

Yamamoto, T. 118, 122, 133, 279, 280,286

Yeo, S. 272Yilmaz, K. 159, 243 n., 273Yoo, B. S. 118 n., 135, 140, 147 n., 264,

269, 273, 277

Zellner, A. 104, 108, 109 n., 112, 287Zivot, E. 76, 78 n., 287