Causal Analysis in Economics: Methods and Applicationsamw2008.econ.usyd.edu.au/pdfs/B-5_Pu_Chen.pdf · to causality. Heckerman, Geiger, and Chickering (1995) provide the Bayesian

Causal Analysis in Economics: Methods and

Applications

Pu Chen∗ Chihying Hsiao† Peter Flaschel‡ Willi Semmler§

August 16, 2007

Abstract

In this paper we apply the method of inferred causation for macroeconomicanalysis. First we introduce briefly the theory of inferred causation developedby Pearl and Verma (1991). We apply this method to the identification ofstructural vector autoregression (SVAR) models. In an example of monetarypolicy analysis we demonstrate how causal information embedded in the datacan be used to identify the SVAR. We use the two step procedure developedin Chen and Hsiao (2007) for time series causal models (TSCM) to derive twostructural Phillips curves for the wage-price spiral. We demonstrate that whilein most structural equations the causal-effect relations are presumed, using themethod of inferred causation we can derived the causal-effect relation in thestructural equations from the data.

JEL Classification: C1,C51, E52, J3Keywords: Automated Learning, Inferred Causation, VAR, Identification, Mone-tary Policy, Wage-price Dynamic

∗Corresponding author, New School for Social Research, New York, Department of Economics,

E-Mail: [email protected]†University of Technology Sydney‡New School for Social Research§New School for Social Research, New York, Department of Economics

1

1 Introduction

Since the development of the successful learning algorithms for Bayesian networks,the probabilistic causal approach attracts more and more attention of the scientificcommunity1. Spirtes, Glymour, and Scheines (2001) provide a detailed descriptionof learning Bayesian networks through sequential tests and the causal interpretationof the test results. Pearl (2000) gives a rigorous account of the probabilistic approachto causality. Heckerman, Geiger, and Chickering (1995) provide the Bayesian tech-nique for learning Bayesian networks from data. Despite the controversial debate onthis Bayesian network causal approach2, the automated causal inference based onBayesian network models becomes an effective instrument to assess causal relationsempirically.

Recently, these graphical models have found their way into the literature on timeseries analysis and econometrics. Dahlhaus (2000) gives a graphical interpretationof the conditional independence among the elements of multivariate time series.Bach and Jordan (2004) present graphical models for multivariate time series inthe frequency domain. Eichler (2003) gives a graphical presentation of the Grangercausality among the elements a multivariate time series. Some pioneering works ofgraphical models in econometrics can be found in Glymour and G. (1988). Hoover(2005) sketches the application of the Bayesian network technique for identifyingstructural VAR models in light of the fact that the contemporaneous structuralparameters can have a causal interpretation. Swanson and Granger (1997) applythe similar concept to identity the causal chain in VAR residuals. Demiralp andHoover (2004) apply the Bayesian network method to VAR residuals to infer thecausal order in the money demand and the monetary transmission mechanism.

Following this line of research, we develop in this paper a causal model formultivariate time series data. We apply the probabilistic causal approach to definecausal models for multivariate time series. It turns out that under some reasonableadditional assumptions on the causal structures for time series, TSCMs becomestatistically assessable. Further we show that these TSCMs are equivalent to SVARmodels. In thess ways, we give a causal theoretical justification for the applicationof the automatic inference to identify a SVAR as described in Hoover (2005). Weinterpret a SVAR in terms of a contemporaneous causal structure and a temporalcausal structure. A two step procedure is developed to learn the contemporaneousand the temporal causal structure of a multivariate time series causal model.

The remainder of the paper is organized as follows. In Section 2 we introducebriefly the theory of inferred causation. Here we focus on the causal interpretation ofthe Bayesian network models and their relations to linear recursive structural models

1Although inferring causal relations used to be the primary target of statistical analysis, this

ambition was abandoned for a long time. See Pearl (2000) for more details.2see Cartwright (2001) and Pearl (2000) p. 41 for more details.

1

for jointly normal variables. In section 3 show how the method of inferred causationcan be used to solve the problem of identification of SVAR. We demonstrate inan example of SVAR for analysis of monetary policy how the automated learningprocedure can achieve a causal identification of SVAR. We show also the limit ofthe causal identification method. In Section 4 we extend the concept of the causalmodels to multivariate time series data and show how structural equations withcausal interpretations can be derived from data. We demonstrate this in an empiricalanalysis of price and wage Phillips curves. The last section concludes.

2 Inferred Causation

2.1 A Model Selection Approach to Inferred Causation

A fundamental assumption of the method of inferred causation is, as given in Def-inition 2 in Pearl and Verma (1991): the casual relations among a set of variablesU can be modelled in a directed acyclic graph(DAG) D and a set of parametersΘD, compatible with D. ΘD assigns a function xi = fi(pa(xi), εi) and a probabilitymeasure gi to each xi ∈ U , where pa(xi) are parents of xi in D and each εi is arandom disturbance distributed according to gi independently of the other ε’s andof any preceding xj : 0 < j < i.

The probability measure compatible with D is called to satisfy the Markov condi-tion in Pearl (2000) p.16. The Markov condition implies that conditioning on pa(xi),xi is independent of all its other predecessors. In particular it implies that the dis-turbance εi are independent form other ε’s. In addition to the Markov condition, theminimality of the causal structure3, D, and the stability of the distribution are twokey assumptions on the data-generating causal model to rule out the ambiguity ofthe statistical inference in recovering the data-generating causal model.4. Further, aDAG with a probability measure P that satisfy the Markov condition with respectto the DAG(See Fig.1 for examples.) prescribes an ordering of the variables in theDAG and the factorization of the joint distribution of the variables as the productof the conditional distributions. A sparse DAG implies in particular a set of condi-tional dependence and independence among variables. In (a) and (b) of Fig.1 A andC is said to be d-separated by B. This implies that for all compatible distributionswith the DAGs A and C would be dependent, but conditioning on B, they would beindependent. In the case of (a) B is said to screen A from B. In the case of (b) B iscalled the common cause of A and C. In (c) of Fig.1 A and C is not d-separated by

3See Definition 5 in Pearl and Verma (1991)4It is still an ongoing debate whether causality can be formulated in such assumptions. See

Cartwright (2001), Pearl (2000), Spirtes et al. (2001) Freedman and Humphreys (1998) for more

discussion. Spirtes et al. (2001) took an axiomatic approach to pave the logical basis for the method

of inferred causation.

2

B5. This implies that at least for one distribution compatible with the DAG, A andC would be independent, but conditioning on B they would become dependent6. B

is the effect of A and C. Here B is called an unshielded collider on the path ABC.In the literature an unshield collider is also called a v structure, because it consistsof two converging arrows whose ends are not connected. A shielded collider wouldhave a direct link between A and C.

A C

B

A C

B

(a) (b)

A C

B x3

x2

x1

x4

x5

(c) (d)Figure 1: Influence Diagram

Under the Markov assumption, a compatible distribution of a DAG can be fac-torized into the conditional distributions according to the DAG. Hence we knowthat the DAG (d) in Fig.1 implies that the joint distribution can be calculated asfollows.

f(x1t, x2t, x3t, x4t, x5t)

= f(x4t|x5t)f(x5t|x1t, x3t)f(x3t|x2t)f(x1t|x2t)f(x2t), (2.1)5See Pearl (2000) p. 16 for the detailed definition of d-separation.6See Pearl (2000) p.18.

3

This implies the following conditional independence: given x5t, x4t is independenton other variables; given x1t and x3t, x5t is independent on x2t; and given x2t, x3t

is independent on x1t.

The fundamental assumption of the method of inferred causation translates theproblem to infer causal relations among variables into a statistical problem to recoverthe true data generating DAG model using the observed data, and then to interpretthe directed edges in the DAG as causal-effect relations.

The implication of a DAG on the patterns of the conditional dependence andindependence invites inference of the data generating DAG from these patterns ofthe conditional dependence and independence. Identifying the underlying DAG fromthe patterns of conditional independence and dependence has been the main researchactivity in the area of inferred causation. We will give a more detailed descriptionabout it in the next section.

Alternatively, consistent model selection criteria can also be used to identify thedata generating DAG, if the data generating DAG is under the set of models to beselected. The assumption that the data generating DAG is under the set of DAGmodels under consideration is called the causal sufficiency assumption7.

Therefore, under causal sufficiency applying a consistent model selection criterionto search over all possible DAG models will identify the data generating DAG or itsobservationally equivalent models consistently.

In this paper we will use this method to uncover the data generating DAG. Thestatistical process of uncovering the data generating DAG is called learning of DAGin the literature.

In example (d) in Fig.1 we will search over all DAG models consisting of thefive variables x5t, x4t, x3t, x2t, x1t. A consistent model selection criterion evaluates amodel by the sum of its likelihood and a penalty on the dimensionality of the model.The likelihood is the leading term in this sum such that all misspecified models willnot be selected asymptotically and the penalty term will go to infinite as T → ∞,such that the probability to select a model with too many parameters will convergeto zero.

In this context, statistically learning of the causal order is equivalent to searchingfor the most parsimonious model that can account for the joint distribution of thevariables in the class of all possible recursive models.

Now it is of interest to ask:7If some variables are not observed(these kind of variables are called latent variables), then the

data generating DAG may not be within the set of DAG models to be investigated. The method of

inferred causation can be used to detect the existence of latent variables. We will not discuss this

issue in this paper. We consider here only the cases under causal sufficiency assumption.

4

• If data are generated from a causal model, can statistical procedures alwaysuniquely identify this causal model?

• If a causal model cannot be uniquely identified by statistical procedures, whichcausal properties of the causal model can be identified by statistical proce-dures?

• How effective is a statistical learning procedure?

The answers to these questions are the main research issues of the probabilis-tic causal approach. The first and the second question concern the observationalequivalence of causal models and the assumptions of causal models. The third oneconcerns the efficiency of algorithms to learn causal relations implied in the observeddata. Pearl (2000), Spirtes et al. (2001) and Heckerman et al. (1995) provide themost detailed and up-to-date accounts in this area.

Observationally equivalent models will generate data with identical statisticalproperties. Therefore, statistical method can only identify the underlying DAGs upto the observationally equivalent classes. For the observational equivalence we quotethe results in Pearl (2000) p.19.

Proposition 2.1 [Observational Equivalence ] Two DAGs(models) are observation-ally equivalent if and only if they have the same skeleton and the same set of v-structures, that is two converging arrows whose tails are not connected by an arrow(Verma and Pearl 1990).

Since statistical method cannot differ the observationally equivalent models fromeach other from the data, not every causal direction in a DAG can always be iden-tified according to this Proposition. Only those causal directions in a DAG canbe identified, if they constitute v structures or if their change would result in newv-structures or cycles. Consequently, if a data generating DAG has observationallyequivalent models, i.e. there exists some arrows in the DAG, the change of whosedirections will not lead to a new v structure or cycles, the direction of these arrowsin the DAG cannot be uniquely inferred from the data. The existence of observa-tional equivalence places a limit on the ability of statistical method to identify thethe directionality of dependence.

Given a set of data generated from a causal model, a statistical procedure canprincipally identify all the conditional independence. However, the statistical pro-cedure cannot differ whether this kind of independence is due to a lack of the edgein the DAG of the causal model or due to particularly chosen parameter values ofthe DAS such that the edge in this case implies the independence. To rule out thisambiguity, Pearl (2000) assumes that all the identified conditional independence aredue to lack of edges in the DAG of the causal model. This assumption is called

5

stability condition in Pearl (2000). In Spirtes et al. (2001) it is called faithfulnesscondition. This assumption is therefore important for interpreting the conditionaldependence and independence as causal relations.

2.2 DAGs and Structural Models

It can be shown that if X is jointly normally distributed, a Bayesian network modelof X is equivalent to a linear recursive structural equation model (SEM)(See Pearl(2000), p. 141.).

xj =j−1∑

k=1

ajkxk + εj , (2.2)

where εj are independently normally distributed. We call (2.2) a linear causal model.We summarize this fact in the following proposition8.

Proposition 2.2 If a set of variables X are jointly normal X ∼ N(0, Σ), a Bayesiannetwork model for X can be equivalently formulated as a linear recursive structuralequation model that is represented by a lower triangular coefficient matrix A withones on the principle diagonal. Any nonzero element in this coefficient matrix, sayαjk corresponds to an directed edge from variable k to variable j.

A =

1 0 . . . 0

α21 1. . .

......

. . . . . . 0αn1 αn2 . . . 1

=

1 0 . . . 0

−a21 1. . .

......

. . . . . . 0−an1 −an2 . . . 1

, (2.3)

where A is the triangular decomposition matrix of Σ with AΣA′ = D.

Proof: Let Σ be the covariance matrix of X. A Bayesian network model for X isa factorization of the joint distribution as product of the conditional distributions ofthe components of X in a given order. Because conditional distributions of jointlynormal distributed random variables are normal and the conditional means are linearfunction of conditioning variables, a Bayesian network model for jointly normaldistributed variables corresponds to linear recursive structural equations model. ¤

Given the correspondence between a recursive SEM and a DAG for jointly normalvariables, the parameter aij of the recursive structural model corresponds to the edgefrom the vertex xj to the vertex xi. aij = 0 corresponds to the absence of the edgefrom the vertex xj to the vertex xi, which implies that xj and xi are conditionally

8Bayesian network models can be used to encode and joint distributions. Therefore they can also

be applied to nonlinear models. Because linear models are often used in econometrics we discuss

here only linear models.

6

independent, given the predecessors of xi. Therefore, the more null restrictions arecursive model has, the simpler the corresponding DAG will be. Searching for theDAG with ’most conditional independence’ is equivalent to searching for the mostparsimonious recursive SEM for the data.

A useful property of multivariate normal distribution is that the conditional co-variance, the conditional variance and the conditional correlation coefficient: σXj |z,σXj ,Xi|z and ρXj ,Xi|z are all independent of the value z. Moreover the partial cor-relation coefficient is zero if and only if (Xi⊥Xj |z)9. Because we can estimate arecursive SEM by OLS, we have important relation between the parameter of therecursive SEM and the partial correlation coefficient:

rY X.Z = ρY X|ZσY |ZσX|Z

, (2.4)

where rY X.Z is the regression coefficient of Y in the linear regression on X and Z

Y = aX + b1z1 + b2z2 + ... + bkzk (2.5)

This means the coefficient a is given by a = rY X.Z . This relation is very useful forunderstanding algorithms to learn the causal structures.

2.3 Learning Bayesian Networks: Algorithms

As stated in the last section, inferring causal relations within a set of variables istantamount to selecting the most parsimonious recursive model within the class ofall possible recursive models. In principle, we could analyze all possible recursivemodels and find out the most parsimonious one. This is, however, only practicableif the number of variables is very small, because the number of all possible causalmodels grows explosively with the number of variables. For a system of 6 variablesthere are 3781503 possible models.

To solve this curse-of-dimensionality problem, a number of heuristic algorithmshave been proposed. There are basically three types of approaches to this problem.One is based on sequential tests of partial correlation coefficients. The tests runfrom the low order partial correlation coefficients in unconstrained models to thehigh order partial correlation coefficients10. A limited version of this algorithm canbe found in Swanson and Granger (1997). Hoover (2005) gives a very intuitivedescription of this procedure and Spirtes et al. (2001) provides a more detaileddiscussion. A basic version of the algorithm is presented by Pearl (2000) under thename IC algorithm as follows.

9(Xi⊥Xj |z) denotes that conditioned on z, Xi and Xj are independent.10See http://www.phil.cmu.edu/projects/tetrad/ for more details and software for this algorithm.

7

Algorithm 2.1 (IC Algorithm (Inductive Causation))

Input: P a stable11 distribution on a set X of variables.

Output: A pattern (DAG) compatible with P .

• for each pair of variables (Xi,Xj) ∈ X, search a set Sij such that (Xi⊥Xj |Sij)holds in P . Construct an undirected graph G such that vertices Xi and Xj areconnected with an edge if and only if no such set Sij can be found.

• For each pair of nonadjacent variables Xi and Xjwith a common neighbor Xk,check if Xk ∈ Sij.If it is, then continue. If is is not, then add arrowheads pointing as Xk:(Xi− > Xk < −Xj).

• In the partially directed graph that results, orient as many of the undirectededges as possible subject to two conditions: (i) the orientation should not createa new v structure; and (ii) the orientation should not create a directed cycle.

Proposition 2.3 Under the assumption of faithfulness, the IC-algorithm can con-sistently identify the inferrable causal structure, i.e. for T → ∞ the probability ofrecovering the inferrable causal structure of the data generating causal model con-verges to one.

Proof : See Pearl (2000) and Spirtes et al. (2001). p.114 2

This means that if the data generating linear causal model is statistically distin-guishable, the IC-algorithm will uniquely identify the causal order consistently. If thedata generating causal model is not statistically distinguishable, the IC-algorithmwill uniquely identify the causal order among the simultaneous causal blocks andthe v-structures consistently.

Remarks According to (2.5) we know that a zero parameter in a structuralmodel corresponds to a conditional independence. In step 1 of the IC algorithm,two vertexes are connected if and only if no subset can be found such that condi-tioning on this subset these two vertexes become independent. This implies thatthe DAG generated by the IC algorithm will contain no less zero restrictions thanany recursive models that representing the probability law P 12, because all possi-ble conditional independence that may correspond to zero parameters a recursivestructural equation are presented by missing edges in the DAG given by the IC

11Stability of a distribution means that the freely varying parameters of the data-generating

causal models will assume parameter values other than zero(or the probability to assume the value

zero is zero) in order that all identified zero parameter-values to be interpreted as zero restrictions

on the parameters. It is also known as faithfulness condition.12Under the assumption normality P is represented by the variance covariance matrix.

8

algorithm. On the other hand, under the assumption of causal sufficiency, i.e. P

over X is a DAG, the output of the IC algorithm will be a DAG that can accountfor the probability law P according to Proposition 2.3. Because a DAG correspondsto a recursive structural model. The DAG given by the IC algorithm cannot havemore zero restrictions than the most parsimonious recursive model by definition.Consequently the output of IC algorithm must be a most parsimonious recursivestructural model13.

The second approach is based on the Bayesian method of model averaging.14 Thistechnique combines the subjective knowledge with the information of the observeddata to infer the causal relation among variables. This kind of algorithm differs inthe choice of criteria for the goodness of fit that is called the score of a network, andin the choice of search strategy. Because the search problem is NP-hard15, heuristicsearch algorithms such as greedy search, greedy search with restarts, best-fit search,and Monte-Carlo method are used16.

The third approach, which we also use in the empirical application in this paper,uses classic model selection methods. Its implementation is similar to the Bayesianapproach but without any need for prior information. A network is evaluated ac-cording to information criteria such as AIC and BIC. The search algorithms aresimilar to those from the Bayesian approach, such as local greedy search, or greedysearch with restart.

Algorithm 2.2 (Greedy Search Algorithm)

Input: P a stable distribution on a set X of variables.

Output: A DAG compatible with P .

• Step 1: Start with a Bayesian network Ai.

• Step 2: Calculate the network score according to BIC/AIC/likelihood criterion.

• Step 3: Generate the local neighbor networks by either adding, removing orreversing an edge of the network Ai.

13The orientation of the v structure in Step 2 is necessary. Without loss of generality we assume

that xi is before xj in the order of recursion. The edge between xk and xj imply that either xk will

appear in the structural equation of xj or xk will appear in the structural equation of xj . Because

xk /∈ Sij implies that xk is not in the structural equation of xj in which xi has zero coefficient.

Therefore xk must be behind xj and xi in the order of recursion. This leads to the converging

arrows at xk.14Cf. Heckerman (1995).15See Heckerman (1995) for details.16See Heckerman (1995) for details. An R-package ”deal” for learning Bayesian networks can be

found at http://www.r-project.org/gR/

9

• Step 4: Calculate the scores for the local neighbor networks. Choose the onewith the highest score as Ai+1. If the highest score is larger than that of Ai,update Ai and goto Step 2. If the highest score is less than that of the originalAi, take output Ai as the local solution.

3 Application I: Identification of SVAR

3.1 The Identification Problem in SVAR

Consider a VAR for an m-dimensional vector of variables, Xt17:

Xt =p∑

i=1

BiXt−i + ut, E(ut) = 0, E(utu′t) = Σ (3.6)

Denote the m×1 vector of structural normalized innovations by εt with E(εtε′t) =

I. The problem now is to find a matrix A0, such that

A0ut = εt. (3.7)

The variance-covariance matrix of reduced-from residuals provides the restrictionthat

Σ = A−10 A−1

0′. (3.8)

The matrix A0 in (3.7) defines a structural VAR, which results by premultiplying(3.6) by A0:

A0Xt =p∑

i=1

AiXt−i + εt. (3.9)

The key point to note is that if the true data generating process (DGP) is theSVAR (3.9), the information provided by the estimated VAR (3.6) is not sufficientto recover that DGP. In fact after taking (3.8) into account there are m(m − 1)/2degrees of freedom in specifying A0 and thus the structural model.

A large body of literature explores the identification schemes for SVAR, Uh-lig (2005), Beaudry and Saito (1998), Christiano, Eichenbaum, and Evans (1998),Cochrane (1998), Leeper, Sims, and Zha (1996), Mojon (2005), Blanchard (1989),Bernanke (1986) and the references cited therein build a big liste of identificationschemes. The diverse identification approaches suggested so far are all based onimposing a priori structural restrictions on the matrix A0 that are supported by the-oretical arguments, plausibility or reasonability without any reference to the data.

17Most result of this section is taken from Chen and Lemke (2006).

10

In concordance with the basic motivation of VAR methodology to rule out the”incredible” identification conditions, the causal learning procedure for identificationtries to identify an SVAR only based on the properties of the data not on any a prioriinformation.

Due to the very nature of inferred causation, however, the procedure can onlyidentify a complete causal order, if the data does possess a complete causal order.Otherwise it can only identify causal orders to the extent that the date reveal them.The method of inferred causation provides us with an instrument to clarify howmuch theory is necessary for the identification of SVARs.

3.2 Identifying Monetary Policy Shocks

We apply the approach expounded in the previous section to the family of VARmodels considered in the seminal paper of Christiano et al. (1998) (CEE98). Inorder to relate our results to those obtained in their paper, we will first restrictourselves to the sample period 1965 to 1995. Our VAR contains seven variables: thelog of real GDP Y , the log of the consumer price index CPI, a measure of smoothedchanges of a commodity price index CP , the federal funds rate FF , the log of totalreserves TR, the log of nonborrowed reserves NBR and the log of the monetaryaggregate M1. Details on these data are provided in the appendix.

CEE98 use two benchmark identification schemes, one using the Federal Fundsrate FF as monetary policy variable, another with nonborrowed reserves NBR

playing this role. For their first scheme, it is assumed that the information set Ωt,on which the monetary authority bases its decisions, contains (besides lags of thefederal funds rate) contemporaneous and lagged values of GDP, the price level, andthe commodity price index, whereas for total reserves, non-borrowed reserves andthe money stock, only the lags are included.

With respect to our variables18 the first CEE98 identification scheme wouldentail the following block structure of the A0 matrix in (3.7). Partition the vec-tor of variables as X ′

t = (X ′1t, FFt, X ′

2t), with X ′1t = (Yt, CPIt, CPt)′ and X2t =

(NBRt, TRt,M1t). Then

A0 =

a11 0 0a21 a22 0a31 a32 a33

(3.1)

where a11 has dimension 3× 3, a22 is a scalar, and a33 is 3× 3.

We will apply the described learning procedure to the identification problemand explore in how far the corresponding results are in line with the assumptions of

18Note that CEE98 use the GDP deflator rather than the CPI within their analysis. They also

try both M1 and M2 as variants for the money stock.

11

CEE98. For that, we first estimate the unrestricted VAR with the 7 variables usinga lag length of 4 as advocated by the LR criterion. Then we utilize the greedy searchalgorithm described in the previous section to the estimated VAR residuals. Theresulting graph in Figure 1 shows the inferred causal relations between the structuralinnovations.

CPI Y M1

CP FF TR

NBR

Figure 1: Causal graph for structural innovations. Based on estimation sample1965Q1 - 1995Q4

The first thing to note is that the graph contains one v-structure: FF and TR

both point towards NBR but are not connected by an arrow themselves. Thus, thefederal funds rate and total reserves are causal for non-borrowed reserves. For theremaining three edges, the information in the data did not reveal any direction. Asstated in Proposition 2.1 two DAGs are observationally equivalent if they have sameskeleton and same set of v-structures. In the graph above the edges CP − Y − FF

with the direction CP → Y → FF is observationally equivalent to the the graphwith the direction CP ← Y ← FF . Therefore we present these edges withoutdirection. Thus, the algorithm finds contemporaneous dependence between thosevariables, but remains silent as for the direction of causality. Finally, the missingedges imply independence among the corresponding variables.

In light of this graph, the identification scheme suggested in CEE98 is not re-jected by the information about causal order contained in the data. The dependencebetween Y and FF , and between CP and FF indicates that it is in fact reasonableto include contemporaneous CP and Y in the monetary authority’s information setΩt.

12

The detected independence between CPI and FF , between M1 and FF , as wellas between TR and FF indicate that it is irrelevant whether to put them beforeFF or after FF in the recursive ordering. The graph shows further that the dataare indecisive about the causal order between Y and FF and between CP and FF .Therefore, theoretical arguments have to be invoked to provide an identificationcondition. Following the argument of CEE98, we will put CP and Y before FF

for the subsequent analysis. Finally, the missing edges in the causal graph providequite a large number of further zero restrictions on the matrix A0.

Based on the identifying restrictions, we compute impulse responses to a federalfunds rate shock which are given in figure 2. For comparison, figure 3 shows theIRF that corresponds to the (blockwise) ordering of CEE98 - represented by A0

in (3.1) with the corresponding partitioning of X ′t = (X ′

1t, FFt, X′2t), where X ′

1t =(Yt, CPIt, CPt)′ and X2t = (NBRt, TRt, M1t). That is, the additional restrictionsobtained from causality analysis are not imposed.19

We observe that there are slight differences in the shape of the impulse responsesbetween figures 2 and 3 and also between the corresponding asymptotic confidencebands. (Note the different scaling in some of the figures.) However, altogether,the additional restrictions imposed by our causality analysis do not lead to strikingdifferences compared to the results of the CEE98 ordering.

19It should be noted that the particular ordering of the elements within X1t and X2t, respectively,

does not affect the impulse responses to a shock of FF , as stated in Proposition 4.1 (iii) of CEE98.

13

-.012

-.008

-.004

.000

.004

.008

2 4 6 8 10 12 14 16 18 20 22 24

Response of Y to FF

-.025

-.020

-.015

-.010

-.005

.000

.005

.010

2 4 6 8 10 12 14 16 18 20 22 24

Response of CPI to FF

-1.5

-1.0

-0.5

0.0

0.5

1.0

2 4 6 8 10 12 14 16 18 20 22 24

Response of CP to FF

-.025

-.020

-.015

-.010

-.005

.000

.005

.010

2 4 6 8 10 12 14 16 18 20 22 24

Response of TR to FF

-.03

-.02

-.01

.00

.01

2 4 6 8 10 12 14 16 18 20 22 24

Response of NBR to FF

-.016

-.012

-.008

-.004

.000

.004

2 4 6 8 10 12 14 16 18 20 22 24

Response of M1 to FF

Response to Structural One S.D. Innovations – 2 S.E.

Figure 2: Impulse responses to a one-standard-deviation shock to the federal fundsrate FF , based on identification scheme from causal analysis. Sample: 1965Q11995Q4

14

-.010

-.008

-.006

-.004

-.002

.000

.002

2 4 6 8 10 12 14 16 18 20 22 24

Response of Y to FF

-.025

-.020

-.015

-.010

-.005

.000

.005

2 4 6 8 10 12 14 16 18 20 22 24

Response of CPI to FF

-1.6

-1.2

-0.8

-0.4

0.0

0.4

0.8

2 4 6 8 10 12 14 16 18 20 22 24

Response of CP to FF

-.016

-.012

-.008

-.004

.000

.004

.008

2 4 6 8 10 12 14 16 18 20 22 24

Response of TR to FF

-.020

-.015

-.010

-.005

.000

.005

.010

2 4 6 8 10 12 14 16 18 20 22 24

Response of NBR to FF

-.012

-.010

-.008

-.006

-.004

-.002

.000

.002

2 4 6 8 10 12 14 16 18 20 22 24

Response of M1 to FF

Response to Cholesky One S.D. Innovations – 2 S.E.

Figure 3: Impulse responses to a one-standard-deviation shock to the federal fundsrate FF , based on identification scheme from CEE98 ordering. Sample: 1965Q11995Q4

15

When we extend the sample period until 2006Q2, we observe that the basic causalstructure of the structural innovations remains quite stable. Comparing figure 4 to1, the only difference is the contemporaneous dependence between FF and CPI

that is implied by the additional edge linking CP and CPI.

CPI Y M1

CP FF TR

NBR

Figure 4: Causal graph for structural innovations. Based on estimation sample1965Q1 - 2006Q2

Finally, we consider the implications of the causal identification scheme for thecase that the monetary policy variable is given by NBR instead of FF , which cor-responds to the second identification scheme of CEE98. The algorithm finds thatNBR is causally dependent on FF . Thus, in this case, the causal identificationprocedure would indicate that the second identification scheme chosen in Christianoet al. (1998) is not conformable with the data. This is also reflected in the corre-sponding impulse responses as those corresponding to the causal procedure differsignificantly from those corresponding to the CEE98 ordering. (The IR graph is notshown in this paper.)

This difference is because that the second identification scheme of CEE98 as-sumes that FF is causally dependent on NBR which contradicts the causal orderidentified by the method of inferred causation. Consequently both the policy shocksand the IRFs of the second identification scheme of CEE98 differ from those of thecausal identification, respectively.

One theoretically interesting question in this context is what would be the causalgraph such that both the first and the second benchmark identification schemes ofCEE98 were all correct?

16

The answer depends on what we mean with ”correct”. If we mean that thesebenchmark identification schemes were exclusively identification schemes that areconformable with the casual order embedded in the data, i.e. any other identificationschemes were all contradictory to the causal order in the data, then the causal graphhad to be like follows.

M1

FF NBR

TR

Y

CPI

CP

Figure 5: Causal graph support CEE98 identification schemes

The causal graph in Fig 7 is characterized by that every arrow in the graph is acomponent of a v-structure, which means that the causal graph has no observationalequivalence.

In the terminology of CEE98, this causal graph implies that the monetary au-thority see Y , CPI and CP before he makes monetary policy on FF , and M1, TR

are causally dependent on FF . Since NBR and FF are causally independent, it isindifferent whether NBR belongs to the set of X1 or the set of X2 with respect tothe monetary policy variable FF . This implies that the first identification schemeof CEE98 is conformable with the causal graph in Fig. 7.

Since FF and NBR are symmetric in the causal graph, NBR can equally wellbe used as a monetary policy variable. Therefore both the first and the secondidentification schemes are consistent with the causal graph in Fig. 7.

If the correctness does not have the exclusivity, there may exist a large numberof causal graphs that do not contradict the identification schemes as suggested inCEE98. A typical causal graph is as follows in Fig. 8.

This graph possesses no v-structure. No causal order among the structural in-novations can be identified from the data. All the directions of the arrows in thegraph are not binding, because there exist other observationally equivalent models

17

M1

FF NBR

TR

Y

CPI

CP

Figure 6: Causal graph consistent with all just identified identification schemes

which the arrows are just in the opposite directions (Compare the example in Sec-tion 2.1). Since any order of recursion is a valid representation of the joint reducedform innovations, this causal graph will not contradict the benchmark identificationschemes of CEE98.

4 Application II: Derivation of Structural Models with

Causal-Effect Relations

4.1 Time Series Causal Models

Assuming that the elements of the multivariate time series Xit, i = 1, 2, ...n andt = 1, 2, ...T are jointly normal, then a casual model for the multivariate time seriesis a recursive structural model in all the n×T element. Since temporal informationprovides a nature causal order, the recursive structural model must follow the thetemporal order. We can write the system as follows.

A01 0 . . . 0A21 A02 0...

. . ....

AT1 AT2 . . . A0T

X1

X2...

XT

=

ε1ε2...

εT

, (4.2)

where εt, t = 1, 2, ..., T are vectors of independent random variables20.

According to Chen and Hsiao (2007) we have:20In the model above we have assumed that the random process have started at t = 1.

18

Definition 4.1 (TSCM) A linear recursive model of time series is called a time-series-causal-model if it satisfies the following three constrains:

1. temporal causal constraint,

2. time invariant constraint, and

3. finite causal influence constraint.

For p = 2 the causal model is written as follows.

A0 0 . . . . . . 0A1 A0 0 . . . 0A2 A1 A0 0 . . . 0

0. . . . . . . . . . . .

...... 0 A2 A1 A0 00 . . . 0 A2 A1 A0

X1

X2...

XT−1

XT

=

ε1ε2...

εT−1

εT

. (4.3)

The first two rows represent the initial condition for the time series. The otherrows represent the invariant causal relations over time.

The causal relations among the time series variables are expressed by the coeffi-cient matrices A0, A1, A2, ..., Ap. According to Assumption 2 above, the A0 is itselfa low triangular matrix. It describes the contemporaneous causal relations amongthe elements of the n-vector Xt. Ai describes the causal dependence between theelements of Xt and elements of Xt−i. Zero elements in the coefficient matrices Ai

implies corresponding causal independence.

According to Chen and Hsiao (2007) the TSCM is equivalent to a SVAR identifiedby the causal order in the element of Xit.

Proposition 4.1 (Two step procedure for TSCMs)

• If the contemporaneous causal structure of the data generating TSCM is obser-vationally distinguishable, the two step procedure will identify the true causalstructure of the TSCM consistently.

• If a TSCM is observationally distinguishable but the contemporaneous causalstructure is observationally indistinguishable, the two step procedure with aconsistent model selection criterion will ”uniquely” identify the data generatingcasual model consistently.

• If a TSCM is observationally indistinguishable, then the two step procedurewith a consistent model selection criterion will uniquely identify the causalorder of the simultaneous causal blocks.

Proof: See Chen and Hsiao (2007)

19

4.2 Tow Phillips Curves with Causal-Effect Relations

Since Fair (2000) introduces two Phillips curves, one for price inflation and one ofwage inflation, two Phillips curves are used in many macro model building21.

The empirical data for the relevant variables discussed above are taken from Eco-nomic Data - FRED c©22. The data shown below are quarterly, seasonally adjusted,annualized where necessary and are all available from 1947:1 to 2004:4. Up to therate of unemployment they represent the business sector of the U.S. economy. Wewill make use in our estimations below of the range from 1965:1 to 2004:4 solely, i.e.,roughly speaking of the last five business cycles that characterized the evolution ofthe U.S economy. We thus neglect the evolution following World War II to a largerdegree.

Variable Transformation Mnemonic Description of the untransformed series

e log(1-UNRATE/100) UNRATE Unemployment Rate (%)

u log(GDPC1/GDPPOT) GDPC1,GDPPOT GDPC1: Real Gross Domestic Product of Bil-

lions of Chained 2000 Dollars, GDPPOT: Real

Potential Gross Domestic Product of Billions

of Chained 2000 Dollars, u:Capacity Utiliza-

tion: Business Sector (%)

w log(HCOMPBS) HCOMPBS Business Sector: Compensation Per Hour, In-

dex 1992=100

p log(IPDBS) IPDBS Business Sector: Implicit Price Deflator, Index

1992=100

z log(OPHPBS) OPHPBS Business Sector: Output Per Hour of All Per-

sons, Index 1992=100

πm MA(dp) inflationary climate measured by the moving

average of price inflation in the last 12 periods

Table 3: Raw Data used for empirical investigation of the model21See Chen and flaschel (2005), Fair (2000), Chen, Chiarella, Flaschel, and Semmler (2005), ...22http://research.stlouisfed.org/fred2/.

20

0.00

0.05

0.10

DLN

W2

0.00

0.04

0.08

0.12

DLN

P2

0.90

0.92

0.94

0.96

0 50 100 150

E2

Time

0.92

0.96

1.00

1.04

U2

0.02

0.06

PIN

−0.0

50.

000.

050.

10

0 50 100 150

DLN

Z2

Time

tsdata

Figure 9: Data for the analysis of wage-price spiral

We construct first a six dimensional VAR model for (dw, dp, e, u, dz, pim)23. Us-ing the Schwarz information criterion we select the lag length 124.

Applying greedy search algorithm with random restarts to the estimated resid-uals of the unconstrained VAR(1), we get following DAG for the contemporaneouscausal structure.

23In a series of unit root tests all 6 time series can be taken as stationary. See Chen and Hsiao

(2007).24The choice of one lag in a system with quarterly data seems to be very unusual. Taking into

account that the inflationary climate variable πm is a summary of the lagged information, this

choice would be not so surprising. See Appendix for details about the possibility of alternative

choice of lag length.

21

dz

e

u

pim

dw

dp

Figure 10: The contemporaneous causal graph in the wage price spiral

The corresponding contemporaneous causal structure matrix is:

1 −0.47 −1.93 1.56 0 −0.500 1 0 0 0 00 0 1 −0.36 0 0.050 0 0 1 0 00 0 0 0 1 00 0.38 0 −2.73 −3.18 1

(4.4)

According to the causal graph, the causal order in the contemporaneous innova-tion is (u, dp, πm, dz, e, dw). After rearranging the contemporaneous causal structurein this order the get the recursive contemporaneous casual structure matrix:

A0 =

1 0 0 0 0 00 1 0 0 0 00 0 1 0 0 0

−2.73 0.38 −3.18 1 0 0−0.36 0 0 0.05 1 01.56 −0.47 0 −0.50 −1.93 1

(4.5)

In the second step we learn the temporal causal structure A1 by applying OLS tothe recursive SEM with the identified contemporaneous causal structural A0. Afterneglecting the insignificant coefficient in the OLS estimation we obtain the estimatedthe temporal causal structure:

22

A1 =

−1.03 0 0 0 0.28 0−0.19 −0.52 −0.40 0.07 0 00.02 −0.12 −0.90 0 −0.08 00.64 0 0 0 0 −0.210.30 0.02 0 0 −0.91 0−2.01 0 −0.54 0 1.94 0

(4.6)

From the estimated TSCM

A0Xt + A1Xt−1 = ut, (4.7)

we obtain two structural Phillips curves, one for the price inflation one for thewage inflation as follows.

dpt = 0.52dpt−1 + 0.40πmt−1 + 0.19ut−1 − 0.07dzt−1 − 0.21 (4.8)

dwt = 0.47dpt + 0.54πmt−1 + 0.44ut + 0.5dzt + 2.0(∆et −∆ut)− 0.42 (4.9)

Unlike most formulations of Phillips curves that are derived based on theoreticalarguments, these two structural Phillips curves are derived based on the causalinformation embedded in the observed data. They represent the causal influence ofthe right-hand-side variables on the dependent variables.

The price Phillips curve shows that the price inflation is driven by the demandpressure measured by the utilization rate of capacity ut−1, and the inflationaryclimate πmt−1 in which the economy operates i.e. the inertial of the price inflationrate. The growth of the labor productivity reduces the wage cost and hence theprice inflation.

In the wage Phillips curve the wage inflation rate is driven by the demand pres-sure term measured by ut, the living cost pressure measured by pt and the inflation-ary climate πmt−1. The growth of labor productivity acts positively on the wageinflation rate. The variable ut is identified as a proper measure of demand pressure.Since ut is highly correlated with the rate of labor utilization of the employed la-bor, this implies that the level of the rate of labor utilization of the insiders on thelabor market within firms is the demand pressure that acts on wage inflation. How-ever if the increase of labor utilization spills over from the insiders to the outsiders∆et −∆ut > 0, large wage inflation will be expected. Generally these two Phillipscurves confirm the results obtained in Chen and flaschel (2006).

23

5 Conclusion

In this paper we show how the method of inferred causation can be applied tomacroeconomic analysis. The automated learning procedure can be used to solvethe identification problem of SVAR. The benefits of this causal identification methodare twofold. First, it helps to reduce the number of additional assumptions requiredfor achieving identification. In the extreme case, the algorithm would provide acomplete set of causal relationships between structural innovations, implying a com-plete solution to the identification problem. Usually, however, only some of theserelations will be uncovered by the approach, while economic theory has to be used ascomplementary information. Moreover, the suggested learning algorithm may alsoextract information from the data that there are additional (overidentifying zero)restrictions for the mapping between reduced-form and structural residuals. Sec-ond, the output of the proposed algorithm may be used to check whether a givenidentification scheme, e.g. decided on the basis of particular theory, can be rejectedby the causal structure embedded in the data.

We applied the causal approach to the monetary policy VAR by Christiano et al.(1998). The resulting graph of causal relationships between structural residuals isin line with the assumptions made in CEE98 for the case that the federal fundsrate acts as monetary policy instrument. Moreover, the causal analysis providesa number of zero restrictions on the matrix mapping reduced-form and structuralinnovations. The implied impulse responses are broadly similar to those obtainedusing the CEE98 decomposition scheme, but show slight variations for the shape ofsome of the variables. In contrast, if one takes the stand that the monetary policyinstrument is measured by nonborrowed reserves - which the second identificationscheme of CEE98 is assuming - their ordering of variables is rejected by the causalanalysis.

The TSCMs developed in Chen and Hsiao (2007) is applied to analysis the dy-namic wage price spiral. The two step learning procedure for TSCMs is used touncover the directionality of dependence in the data, such as contemporaneous de-pendence as well as the temporal dependence. We applied the TSCM to the wage-price dynamic and obtained the result that the price-inflation rate is one of thecauses that drive the wage-inflation rate while the wage-inflation rate has only avery weak indirect feed-back on the price-inflation rate. From the TSCM of thewage-price dynamic we obtained two structural Phillips curves that represent thecausal influence in the determination of the price-inflation rate and the wage infla-tion rate. As structural equations in economics are genuinely interpreted as causalrelation, TSCMs provide potentially a way to derive structural equations in whichthe causal interpretation of the relations is justified.

As the application of the method of inferred causation for the identification ofcausal relations among economic variables is still fairly new, many issues such as

24

the robustness of the resulting causal graphs with respect to the choice of differentsample periods, implications of relaxing the triangularity assumption on A0, theinfluence of the applied statistical criteria in the learning procedure, the efficiencyof the algorithm, or the technique for obtaining a structure that is globally optimal,deserve further investigation.

25

Appendix

A The Data

All data are obtained from www.freelunch.com. The raw data corresponding to thevariables FF , M1, NBR, TR, Y , CPI and CP used in the paper (see section 3.2for the transformations taken) are as follows:

For FF : Federal Funds Rate, (% P.A.); Fed H15

For M1: M1 (SA Billions $); Fed H.6 Money Stock and Liquid Assets, and DebtMeasures

For NBR: Nonborrowed reserves adjusted for changes in reserve requirements,(Mil. $, SA)

For TR: Total reserves adjusted for changes in reserve requirements, (Mil. $,SA); FRB: Aggregate Reserves of Depository Institutions - H.3

For Y : GDP in billions of chained 2000 dollars, BEA;

For CPI: CPI Urban Consumer - All items, (1982-84=100, SA), BLS

For CP : KR-CRB Futures Price Index (1967=100); Knight-Ridder, CommodityIndex Report

26

References

Bach, F. R. and Jordan, M. I. (2004). Learning graphical models for stationarytime series. IEEE Tansmitions on signal processing, 52:2189–2199.

Beaudry, P. and Saito, M. (1998). Estimating the Effects of Monetary Shocks:An Evaluation of Different Approaches. Journal of Monetary Economics, 42:241–260.

Bernanke, B. (1986). Alternative explanations of the money-income correlation.Carnegie-Rochester Conference Series on Public Policy, 25:49–100.

Blanchard, O. J. (1989). A traditional interpretation of macroeconomic fluctua-tions. American Economic Review , 79:1146–1164.

Cartwright, N. (2001). What is wrong with bayes nets? MONIST , 84:242–264.

Chen, P., Chiarella, C., Flaschel, P., and Semmler, W. (2005). Keynesianmacrodynamics and the phillips curve. an estimated baseline macro-model forthe U.S. economy. Quantitative and Empirical Analysis of Nonlinear DynamicMacromodels.

Chen, P. and flaschel, P. (2005). Keynesian dynamics and the wage-price spiral:Identifying downward rigidities. Computational Economics, 25:115–142.

— (2006). Measuring the interaction of wage and price phillips curves for the u.s.economy. Studies in Nonlinear Dynamics and Econometrics, 10, No. 4:Article 2.

Chen, P. and Hsiao, C. (2007). Learning causal relations in multivariate timesereis data. Economics ejournal , Discussion Paper 2007-15.

Chen, P. and Lemke, W. (2006). Identifying svar models of monetary policyusing the method of inferred causation. Mimeo, New School of Social Research,New York .

Christiano, L. J., Eichenbaum, M., and Evans, C. (1998). Monetary PolicyShocks: What Have We Learned and to what End? Working Paper 6400, NationalBureau of Economic Research.

Cochrane, J. H. (1998). What do the VARs mean? Measuring the Output Effectsof Monetary Policy. Journal of Monetary Economics, 41:277–300.

Dahlhaus, R. (2000). Graphical interpretation for multivariate time series.Metrika, 51:157–172.

Demiralp, S. and Hoover, K. (2004). Searching for the causal structure of avector autoregression. Oxford Bulletin of Economics and statistics, 65:745–767.

27

Eichler, M. (2003). Granger causality and path diagrams for multivariate timeseries. Discussion papers, Department of Statistics, the University of Chicago.

Fair, R. (2000). Testing the NAIRU model for United States. Review of Economicsand Statistics, 82:64–71.

Freedman, D. and Humphreys, P. (1998). Are there algorithms that discovercausal structure? http://www.stat.berkeley.edu/ census/514.pdf .

Glymour, S. and G., S. (1988). latent variables, causal model and overidentfyingconstraints. Journal of Econometrics, 39:175–198.

Heckerman (1995). A tutorial on learning with bayesian networks. MicrosoftResearch, MSR-TR-95-06:http://research.microsoft.com/research/pubs.

Heckerman, D., Geiger, D., and Chickering, D. (1995). Learning bayesiannetwork: The combination of knowledge and statistical data. Machine Learning ,20:197–243.

Hoover, K. (2005). Automatic inference of the contemporaneous causal order ofa system of equations. Econometric Theory , 21:69–77.

Leeper, E. M., Sims, C. A., and Zha, T. (1996). What Does Monetary PolicyDo? Brookings Papers on Economic Activity, 2:1–62.

Mojon, B. (2005). When Did Unsystematic Monetary Policy Have an Effect onInflation? Working Paper 559, European Central Bank.

Pearl, J. (2000). Causality . Cambridge University Press, 1st edition.

Pearl, J. and Verma, T. (1991). A theory of inferred causation. In J.A. Allen, R.Fikes, and E. Sandewall(Eds.), Principles of Knowledge Representation and Rea-soning: Procedings of the 2nd International Conference, San Mateo, CA: MorganKaufmann.:441–52.

Spirtes, P., Glymour, C., and Scheines, R. (2001). Causation, Prediction andSearch. Springer-Verlag, New York / Berlin / London / Heidelberg / Paris, 2ndedition.

Swanson, N. and Granger, J. (1997). Impulse response functions based oncausal approach to residual orthogonalization in vector autoregressions. Journalof the American Statistical Association, 92:357–367.

Uhlig, H. (2005). What are the Effects of Monetary Policy on Output? Re-sults from an Agnostic Identification Procedure. Journal of Monetary Economics,52:381–419.

28

Documents

Causal Analysis in Economics: Methods and Applicationsamw2008.econ.usyd.edu.au/pdfs/B-5_Pu_Chen.pdf · to causality. Heckerman, Geiger, and Chickering (1995) provide the Bayesian