29
Econometric Theory, 33, 2017, 263–291. doi:10.1017/S0266466616000086 GRANGER CAUSALITY AND STRUCTURAL CAUSALITY IN CROSS-SECTION AND PANEL DATA XUN LU Hong Kong University of Science and Technology LIANGJUN SU Singapore Management University HALBERT WHITE University of California, San Diego Granger noncausality in distribution is fundamentally a probabilistic conditional independence notion that can be applied not only to time series data but also to cross- section and panel data. In this paper, we provide a natural definition of structural causality in cross-section and panel data and forge a direct link between Granger (G) causality and structural causality under a key conditional exogeneity assump- tion. To put it simply, when structural effects are well defined and identifiable, Gnon-causality follows from structural noncausality, and with suitable conditions (e.g., separability or monotonicity), structural causality also implies Gcausality. This justifies using tests of Gnon-causality to test for structural noncausality under the key conditional exogeneity assumption for both cross-section and panel data. We pay special attention to heterogeneous populations, allowing both structural hetero- geneity and distributional heterogeneity. Most of our results are obtained for the general case, without assuming linearity, monotonicity in observables or unobserv- ables, or separability between observed and unobserved variables in the structural relations. 1. INTRODUCTION Recently, White and Lu (2010, WL) have provided conditions establishing the equivalence of Granger (G) causality and a natural notion of struc- tural causality in structural vector autoregressions (VARs) and in time-series This paper is dedicated to the memory of Hal White, who is profoundly missed. It builds on White’s Econometric Theory Lecture at the 7th Symposium on Econometric Theory and Application (SETA, 2011) in Melbourne. White acknowledges Peter C. B. Phillips for his kind invitation. We gratefully thank the Editor Peter C. B. Phillips, the Co-editor Guido M. Kuersteiner, and three anonymous referees for their many constructive comments on the pre- vious version of the paper. We are indebted to the participants of the SETA, who provided helpful suggestions and discussions. Su gratefully acknowledges the Singapore Ministry of Education for Academic Research Fund under grant number MOE2012-T2-2-021. All errors are the authors’ sole responsibilities. Address correspondence to Xun Lu, Department of Economics, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong; e-mail: [email protected]. c Cambridge University Press 2016 263 Cambridge Core terms of use, available at https:/www.cambridge.org/core/terms. https://doi.org/10.1017/S0266466616000086 Downloaded from https:/www.cambridge.org/core. Singapore Management University (SMU), on 07 Feb 2017 at 05:33:05, subject to the

33 GRANGER CAUSALITY AND STRUCTURAL CAUSALITY IN … Lu_Su... · 2017. 2. 7. · 264 XUNLUETAL. natural experiments. The goal of this paper is to establish the analogous equiv-alence

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 33 GRANGER CAUSALITY AND STRUCTURAL CAUSALITY IN … Lu_Su... · 2017. 2. 7. · 264 XUNLUETAL. natural experiments. The goal of this paper is to establish the analogous equiv-alence

Econometric Theory 33 2017 263ndash291doi101017S0266466616000086

GRANGER CAUSALITY ANDSTRUCTURAL CAUSALITY IN

CROSS-SECTION AND PANEL DATA

XUN LUHong Kong University of Science and Technology

LIANGJUN SUSingapore Management University

HALBERT WHITEUniversity of California San Diego

Granger noncausality in distribution is fundamentally a probabilistic conditionalindependence notion that can be applied not only to time series data but also to cross-section and panel data In this paper we provide a natural definition of structuralcausality in cross-section and panel data and forge a direct link between Granger(Gminus) causality and structural causality under a key conditional exogeneity assump-tion To put it simply when structural effects are well defined and identifiableGminusnon-causality follows from structural noncausality and with suitable conditions(eg separability or monotonicity) structural causality also implies GminuscausalityThis justifies using tests of Gminusnon-causality to test for structural noncausality underthe key conditional exogeneity assumption for both cross-section and panel data Wepay special attention to heterogeneous populations allowing both structural hetero-geneity and distributional heterogeneity Most of our results are obtained for thegeneral case without assuming linearity monotonicity in observables or unobserv-ables or separability between observed and unobserved variables in the structuralrelations

1 INTRODUCTION

Recently White and Lu (2010 WL) have provided conditions establishingthe equivalence of Granger (Gminus) causality and a natural notion of struc-tural causality in structural vector autoregressions (VARs) and in time-series

This paper is dedicated to the memory of Hal White who is profoundly missed It builds on Whitersquos EconometricTheory Lecture at the 7th Symposium on Econometric Theory and Application (SETA 2011) in Melbourne Whiteacknowledges Peter C B Phillips for his kind invitation We gratefully thank the Editor Peter C B Phillips theCo-editor Guido M Kuersteiner and three anonymous referees for their many constructive comments on the pre-vious version of the paper We are indebted to the participants of the SETA who provided helpful suggestions anddiscussions Su gratefully acknowledges the Singapore Ministry of Education for Academic Research Fund undergrant number MOE2012-T2-2-021 All errors are the authorsrsquo sole responsibilities Address correspondence to XunLu Department of Economics Hong Kong University of Science and Technology Clear Water Bay Hong Konge-mail xunluusthk

ccopy Cambridge University Press 2016 263

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

264 XUN LU ET AL

natural experiments The goal of this paper is to establish the analogous equiv-alence between Gminuscausality and structural causality in cross-section and paneldata under certain conditional exogeneity assumptions

As Gminuscausality is mostly examined in the time series context it might bethought that it is strictly a time-series concept if so it would make no senseto talk about Gminuscausality in cross-sections In fact however Gminuscausality isfundamentally a conditional independence notion as pointed out by Florensand Mouchart (1982) and Florens and Fougere (1996) Holland (1986) statesthat ldquoin my opinion Grangerrsquos essential ideas involving causation do not re-quire the time-series setting he adoptedrdquo As we show Gminuscausality has di-rectly relevant and useful causal content not only for time-series cross-sectionpanels but also for pure cross-sections under certain conditional exogeneityassumptions

In this paper we focus on the aspects of the relation between Gminuscausality andstructural causality specific to cross-section or panel data An important data fea-ture here is unobserved heterogeneity We pay special attention to two sources ofheterogeneity that impact testing for structural causality namely structural het-erogeneity and distributional heterogeneity The structural heterogeneity refersto cross-group variation in unobservable constants (eg unknown nonrandomparameters) that enter the structural equation Although unobserved heterogene-ity has a long tradition in the literature on panel studies (see Hsiao 2014 andreferences therein) this type of heterogeneity with a group structure has recentlyreceived considerable attention (see eg Sun 2005 Lin and Ng 2012 Deb andTrivedi 2013 Lu and Su 2014 Su Shi and Phillips 2014 Bonhomme andManresa 2015 Sarafidis and Weber 2015 and Bester and Hansen 2016) Thegroup structure has sound theoretical foundations from game theory or macroe-conomic models where multiplicity of Nash equilibria is expected (see eg Hahnand Moon 2010) The distributional heterogeneity refers to the cross-group vari-ation in certain conditional distributions but seems to have received relativelyless attention in the literature1 Browning and Carro (2007) provide some exam-ples of such heterogeneity in micro-data A similar question concerning distri-butional heterogeneity is also discussed in Hausman and Woutersen (2014) andBurda Harding and Hausman (2015) for duration models The presence of eithersource of heterogeneity plays a central role in linking and testing Gminuscausalityand structural causality

The main contributions of this paper can be clearly articulated First we intro-duce a heterogeneous population data generating process (DGP) for both cross-section and dynamic panel data and extend the concept of Gminuscausality from thetime series analysis to such settings In particular we focus on various versions ofGminuscausality ldquoin distributionrdquo which are suitable for studying nonseparable andnonparametric structural equations In the cross-section data time is not explicitlyinvolved thus Gminusnon-causality is a simple conditional independence relation Inpanel data time plays an explicit role and we pay special attention to the role oftemporal precedence in defining Gminuscausality

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 265

Second as in the time-series context we give a natural definition of struc-tural causality in cross-section and panel data We distinguish the structuralcausality from various average causal effects We show that given the con-ditional form of exogeneity structural noncausality implies GminusnoncausalityIf we further assume monotonicity or separability in the structural equationsstructural causality implies Gminuscausality In the case where we do not assumemonotonicity or separability we strengthen structural causality to structuralcausality with positive probability (wpp) and show that structural causalitywpp implies Gminuscausality These results justify using tests of Gminusnon-causalityto test for structural noncausality We emphasize that appropriately choosingcovariates that ensure the conditional exogeneity assumption is the key toendowing Gminusnon-causality with a structural interpretation For example weshow that both leads and lags can be appropriate covariates in the panel datasetting

Third we establish the linkage between population-group conditional exogene-ity and its sample analogue in a heterogeneous population where the latter formsthe basis for linking Gminuscausality and structural causality We show that with-out the conditional exogeneity at the sample level the derivative of conditionalexpectation can be decomposed into three parts a weighted average marginaleffect a bias term due to endogeneity and a bias term due to heterogeneity Thusas emphasized in the literature (see eg Angrist and Kuersteiner 2004 2011)without such assumptions as conditional exogeneity it is impossible to give theGminusnon-causality test a causal interpretation We show that conditional exogene-ity ensures that the two bias terms vanish in which case certain average subgroupcausal effects with mixing weights are identified

The plan of the paper is as follows In Section 2 we provide a review onvarious concepts of causality In Section 3 we first specify the cross-sectionheterogeneous population DGP and the sampling scheme We define the cross-section structural causality and static Gminuscausality and establish the equivalenceof Gminuscausality and structural causality under certain conditional exogeneityconditions Testing for Gminuscausality and structural causality is also discussedIn Section 4 we consider structural causality and Gminuscausality in panel data Wefocus on dynamic panel structures and derive some testable hypotheses Section5 concludes All the mathematical proofs are gathered in the Appendix

2 LITERATURE REVIEW

This paper builds on the vast literature on causality For reviews on causality ineconometrics see Zellner (1979) Heckman (2000 2008) Imbens (2004) Hoover(2008) Kuersteiner (2008) Angrist and Pischke (2009) Imbens and Wooldridge(2009) among others Broadly speaking we can divide the literature into twocategories The first includes Gminuscausality and Sims causality The second per-tains to the causality defined on structural equations and that defined on potentialoutcomes2

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

266 XUN LU ET AL

21 Gminuscausality and Sims causalityGminuscausality and Sims causality were originally proposed to study time seriesdata Granger (1969) defines Gminuscausality ldquoin meanrdquo based on conditional expec-tations while Granger (1980) and Granger and Newbold (1986) generalize it toGminuscausality ldquoin distributionrdquo based on conditional distributions In this paper wefocus on Gminuscausality ldquoin distributionrdquo which we simply refer to as GminuscausalityTo define it we first introduce some notation For any sequence of random vectorsYt t = 01 we let Y t equiv (Y0 Yt ) denote its ldquotminushistoryrdquo and let σ(Y t )denote the sigma-field generated by Y t Let Dt Yt Xt be a sequence of ran-dom vectors where Dt is the cause of interest Yt is the response of interestand Xt is some variates Granger and Newbold (1986) say that Dtminus1 does notGminuscause Yt+k with respect to σ(Y tminus1 Xtminus1) if for all t = 12

Ft+k

(middot ∣∣ Dtminus1Y tminus1 Xtminus1

)= Ft+k

(middot ∣∣ Y tminus1 Xtminus1

) k = 01 (21)

where Ft+k( middot | Dtminus1Y tminus1 Xtminus1

)denotes the conditional distribution function

of Yt+k given(Dtminus1 Y tminus1 Xtminus1

) and Ft+k

( middot | Y tminus1 Xtminus1)

denotes that of Yt+k

given(Y tminus1 Xtminus1

) As Florens and Mouchart (1982) and Florens and Fougere

(1996) have noted eq(21) is equivalent to the conditional independence relation

Yt+k perp Dtminus1 |Y tminus1 Xtminus1 (22)

where we use X perp Y | Z to denote that X and Y are independent given Z If

E(

Yt+k∣∣Dtminus1Y tminus1 Xtminus1

)= E

(Yt+k

∣∣Y tminus1 Xtminus1)

as (23)

we say that Dtminus1 does not Gminuscause Yt+k with respect to σ(Y tminus1 Xtminus1

)ldquoin meanrdquo Eq(23) is a conditional mean independence statement In the litera-ture most of the discussion on Gminuscausality focuses on k = 0 (see eg Granger1980 1988 and Kuersteiner 2008) The definition of Gminuscausality does notrely on any economic theory or structural assumptions Granger (1969 p 430)emphasizes that ldquoThe definition of causality used above is based entirely on thepredictability of some seriesrdquo Therefore Gminuscausality does not necessarily revealany underlying causal relation and it is entirely unfounded to draw any structuralor policy conclusions from the Gminuscausality tests

ldquoSims noncausalityrdquo was originally defined in Sims (1972) Florens (2003) andAngrist and Kuersteiner (2004 2011) give a generalized definition

Y infint perp Dtminus1 |Dtminus2Y tminus1 Xtminus1 t = 1

where Y infint = (Yt Yt+1 ) Chamberlain (1982) and Florens and Mouchart (1982)

show that under some mild regularity conditions Gminusnoncausality with k = 0 andSims noncausality are equivalent when the covariates Xt are absent When thecovariates Xt are present however Gminusnon-causality and Sims noncausality

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 267

are in general not equivalent For an excellent review on the relationship betweenGminusnoncausality and Sims noncausality see Kuersteiner (2008) Similar toGminusnoncausality Sims noncausality is completely based on predictability and hasno structural interpretation

22 Structural causality and causality in the potential outcomeframework

On the other hand causality defined on structural equations or potential outcomesis mostly discussed in cross-section data The structural equation approach canbe traced back to the work of the Cowles Commission in the 1940s (see egHaavelmo 1943 1944 and Koopmans 1950) though most of their work wasbased on linear equations More recently researchers have generalized linearequations to nonseparable and nonparametric equations (see eg Chesher 20032005 Matzkin 2003 2007 and Altonji and Matzkin 2005) In 2015 Economet-ric Theory published a special issue in memory of Trygve Haavelmo (Volume 31Issue 01 and 02) which contains many interesting discussions on recent develop-ments of causality related to Haavelmorsquos structural models

Consider a simple case where the response of interest is Yi and the cause ofinterest is Di The causal relation between Di and Yi can be characterized by anunknown structural equation r such that

Yi = r(Di Ui )

where Ui is other unobservable causes of Yi For example when Yi is the demandDi is the price and Ui represents certain demand shocks r is the demand functionthat is derived from utility maximization r is a general function and does notneed to be linear parametric or separable between observable causes Di andunobservable causes Ui Here r has a structural or causal meaning and we definethe causal effect of Di on Yi based on r Let D and U be the support of Di andUi respectively If r (du) is a constant function of d for all d isin D and all u isin Uthen we simply say that Di does not structurally cause Yi (see eg Heckman2008 and White and Chalak 2009) For a binary Di the effect of Di on Yi isr(1u) minus r(0u) when Ui = u The effect can depend on unobservable u thusunobservable heterogeneity is allowed For a continuous Di the marginal effectof Di on Yi is partr(du)partd when Di = d and Ui = u To identify the effects of Di

on Yi we often impose the assumption of conditional exogeneity

Di perp Ui | Xi

where Xi is some observable covariates This includes the special case where Di

and Ui are independent corresponding to Di being randomizedHalbert White has made substantial contribution on defining identifying and

estimating causal effects in structural equations White and Chalak (2009) ex-tend Judea Pearlrsquos causal model to a settable system which incorporates features

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

268 XUN LU ET AL

of central interest to economists and econometricians optimization equilibriumand learning Roughly speaking a settable system is ldquoa mathematical frameworkdescribing an environment in which multiple agents interact under uncertaintyrdquo(White and Chalak 2009 p 1760) In the settable system a variable of interesthas two roles ldquoresponsesrdquo and ldquosettingsrdquo When the value of the variable is deter-mined by the structural equation these values are called ldquoresponsesrdquo In contrastwhen the value is not determined by the structural equation but is instead set toone of its admissible values these values are called ldquosettingsrdquo They show thaton the settable system causes and effects can be rigorously defined Chalak andWhite (2012) provide definitions of direct indirect and total causality on the set-table system in terms of functional dependence and show how causal relationsand conditional independence are connected Chalak and White (2011) providean exhaustive characterization of potentially identifying conditional exogeneityrelationships in liner structural equations and introduce conditioning and condi-tional extended instrumental variables to identify causal effects White and Chalak(2013) provide a detailed discussion on the identification and identification failureof causal effects in structural equations White and Chalak (2010) discuss how totest the conditional exogeneity assumption

The treatment effect literature adopts the potential outcome framework (seeeg Rubin 1974 Rosenbaum and Rubin 1983 and Holland 1986) For a causeof interest Di and response of interest Yi we define a collection of potential out-comes Yi (d) d isin D where D is the support of Di Then we can define causalor treatment effect based on the potential outcomes Specifically for two valuesd and dlowast in D we define Yi (dlowast)minus Yi (d) as the causal effects of Di on Yi whenDi changes from d to dlowast If Yi (d) = Yi (dlowast) for all d and dlowast isin D we say thatDi has no causal effects on Yi For example when Di is binary ie D = 01Yi (0) and Yi (1) are the potential outcomes corresponding Di = 0 and Di = 1respectively The treatment or causal effect is Yi (1)minus Yi (0) Two average effectsoften discussed in the literature are

the average treatment effect (ATE) E(Yi (1)minusYi (0))

and

the average treatment effect on the treated (ATT) E(Yi (1)minusYi (0) | Di = 1)

Note that we only observe one outcome in data For example when Di is binarythe observed outcome is simply Yi = Di Yi (1) + (1 minus Di )Yi (0) To identify theeffects we often impose the assumption of unconfoundedness or selection onobservables

Yi (d) perp Di | Xi

where Xi is observable covariates Lechner (2001) Imbens (2004) Angrist andPischke (2009) and Imbens and Wooldridge (2009) provide reviews on identi-fication and estimation of various effects in the potential outcome framework

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 269

White and Chalak (2013 p 280) show that the structural equation framework isequivalent to the potential outcome framework For example the potential out-come Yi (d) is simply r(dUi ) The unconfoundedness assumption Yi (d) perp Di |Xi in the potential outcome framework is equivalent to the conditional exogeneityassumption Ui perp Di | Xi in the structural equation framework

23 Relationship between these two types of causality

We emphasize that Gminuscausality and Sims causality are entirely based on pre-dictability while the causality defined on structural equations or potential out-comes is a real causal relation In the literature there are several papers thatlink these two types of causality White and Lu (2011) provide a direct linkbetween Gminuscausality and structural causality in structural equations They showthat given the conditional exogeneity assumption structural noncausality isessentially equivalent to Gminusnoncausality Angrist and Kuersteiner (2004 2011)and Kuersteiner (2008) discuss the link between Sims noncausality and thenoncausality defined in the potential outcome framework under the assumption ofselection on observables Angrist Jorda and Kuersteiner (2013) study the mon-etary policy effects in the potential outcome framework Lechner (2011) linksGrangerSims noncausality to various average effects defined in the potentialoutcome framework However all the discussions so far have been made in thecontext of time series data

3 STRUCTURAL CAUSALITY AND GminusCAUSALITY INCROSS-SECTION DATA

31 Structural causality

We first specify a population DGP for a single time period omitting the timeindex We write N+ =12 and N = 0cupN+ We also write N+ = N+ cupinfin and N = 0cup N+

Assumption A1 (Cross-Section Heterogeneous Population DGP) Let(F P) be a complete probability space let N isin N+ and for the membersof a population j isin 1 N let the random vector Yj be structurally determinedby a triangular system as

Yj = r(Dj Zj Uj bj

) (31)

where r is an unknown measurable ky times 1 function ky isin N+ Dj Zj and Uj j = 12 are vectors of nondegenerate random variables on (F P) havingdimensions kd isin N+ kz isin N and ku isin N respectively and bj is a nonrandomreal vector of dimension kb isin N Suppose also that Wj is a random vector on(F P) with dimension kw isin N The triangular structure is such that Yj doesnot structurally determine Dj Zj and Uj and neither Yj nor Dj structurallydetermines Wj

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

270 XUN LU ET AL

To keep causal concepts clear it is important to distinguish between the popu-lation DGP defined above and the sample DGP defined later For now we leavesampling aside We may also refer to members of the population as ldquounitsrdquo orldquoindividualsrdquo and the distribution of the population as a mixture distributionMixture distributions appear in many contexts in the literature and arise natu-rally where a statistical population contains two or more subpopulations (see egLindsay 1995 and Mclachlan and Basford 1988) As will be clear in a momentthe sources of population heterogeneity are in general not observed

We interpret Yj as the response of interest It is determined by variables Dj Zj and Uj and the constants bj The variables can be binary categorical orcontinuous Immediately below we formalize a natural notion of causality for thissystem For now it is heuristically appropriate to view Dj as observable causes ofinterest (eg a treatment) Zj as other observable causes not of primary interestand Uj as unobservable causes3

The structural function r is unknown and our goal will be to learn about itfrom a sample of the population Nevertheless when Wj has positive dimensionr embodies the a priori exclusion restriction that Wj does not determine Yj Thetypical sources of this restriction as well as the identities of Dj Zj and Uj theirstatus as observable or not and the priority or precedence relations embodied inthe assumed triangularity are economic theory and specific domain knowledge

The assumed triangularity rules out explicit simultaneity for succinctness andclarity For the causality in simultaneous structural equations there are differentviews in the literature White and Chalak (2009 2013) follow Strotz and Wold(1960) and argue that simultaneous equations are not causal structural relationsbut are instead ldquomutual consistency conditions holding between distinct sets ofstructural equations ndash for example between one set of structural equations govern-ing partial equilibrium and another governing full equilibriumrdquo However otherresearchers argue that simultaneous structural equations can be given a causalinterpretation For example Angrist Graddy and Imbens (2000) provide a poten-tial outcome interpretation for general nonlinear simultaneous equation modelsThey also show that the standard linear instrumental variable estimator identifiesa weighted average of the causal effects

By definition the constants bj are fixed for a given individual Neverthelessthey may vary across individuals we call this variation structural heterogeneityIf the bj rsquos are identical we write them as b0 We assume that bj rsquos are unknownNote that the presence of bj facilitates writing r without a j subscript as differ-ences in the structural relations across population members can be accommodatedby variations in the possibly infinite-dimensional bj

To give a definition of causality for the structural system in Assumption A1 letDj = supp(Dj ) denote the support of Dj ie the smallest closed set containingDj with probability 1

DEFINITION 31 (Cross-Section Structural Causality) Let j be given If thefunction r(middot zu bj ) Dj rarr R

ky is constant on Dj for all admissible z and u

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 271

then Dj does not structurally cause Yj and we write Dj rArrS Yj Otherwise Dj

structurally causes Yj and we write Dj rArrS Yj

Here we implicitly assume that there is variation in potential cause Dj (ie Dj

is nondegenerated) and under the set of counterfactual policies Dj the structuralfunction r is invariant Therefore the invariant function r fully characterizes thecausalstructural relationship between Yj and (Dj Zj Uj )

Structural causality is structural functional dependence in a specific contextSimilar definitions can be given for Zj and Uj but we leave these implicitas these are not the main causes of interest Causality for components of Dj

is defined in an obvious way If we define Yj (d) = r(d Zj Uj bj

)as the

potential outcomes (see eg Rubin 1974 Rosenbaum and Rubin 1983 andHolland 1986) then the equivalent definition of structural noncausality is thatYj (d) = Yj (dlowast) as for all (ddlowast) isin Dj

Structural causality can be easily understood in the familiar linear structure forscalar Yj

Yj = bj0 + Dprimej bj1 + Z prime

j bj2 +Uj

where bj = (bj0bprimej1bprime

j2)prime If bj1 = 0 then Dj does not structurally cause Yj

Otherwise it does This is so natural and intuitive that one might wonder whycausality is such a thorny topic One main reason for confusion arises from therelation between causality and simultaneity as discussed above the triangularsystems considered here obviate this issue Another main reason for confusionis the failure to distinguish carefully between structural equations of the sort writ-ten above and regression equations which may look similar but need not havestructural content The equation above is entirely structural We will encounterregressions (conditional expectations) only after suitable structural foundationsare in place

Observe that heterogeneity in bj rsquos permits Dj to cause Yj for some j and notfor others

The nondegeneracy of Dj ensures that Dj contains at least two points so Dj

is a variable rather than a constant Variation in potential causes is fundamentalfor defining causality (cf Holland 1986) this is what makes possible analogousdefinitions of causality for Zj and Uj Significantly however because bj rsquos arefixed constants they cannot be causes of Yj Instead bj rsquos can be effects or candetermine effects This follows from the following formal definition

DEFINITION 32 (Intervention and Effect) Let j be given and let d and dlowastbe distinct admissible values for Dj Then d rarr dlowast is an intervention to Dj Let(zu) be admissible values for (Zj Uj ) The difference ylowast

j minus yj = r(dlowast zu bj

)minusr(d zu bj

)is the effect on Yj of the intervention d rarr dlowast to Dj at (zu)

This is also referred to as the ldquocausal effectrdquo in the treatment effect literature(see eg Rubin 1974) For the linear case with scalar d the effect of a one-unit

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

272 XUN LU ET AL

intervention d rarr d +1 is

ylowastj minus yj = (

bj0 + (d +1)bj1 + zprimebj2 +u)minus (

bj0 +d bj1 + zprimebj2 +u) = bj1

For unit j this effect is fixed ie it is the same constant for all (zu) Becausebj can differ across units this permits effect heterogeneity

Alternatively if Yj = bj0 + Zj Dprimej bj1 +Uj with scalar Zj then the effect of

d rarr dlowast is

ylowastj minus yj = (

bj0 + zdlowastprimebj1 +u)minus (

bj0 + zd primebj1 +u) = z

(dlowast minusd

)primebj1

Here the effect of Dj on Yj depends on z but not u In this case bj1 determinesthe effect of Dj on Yj together with Zj When an effect depends on a variable(ie an element of Zj or Uj ) it is standard (especially in the epidemiologicalliterature) to call that variable an effect modifier

For simplicity we call ylowastj minus yj the effect of d rarr dlowast Chalak and White (2012)

discuss indirect direct and total effects in structural equations In the potentialoutcome framework Rubin (2004) discusses ldquodirectrdquo and ldquoindirectrdquo casual effectusing principal stratification

We distinguish the structural causal effect from various average causal effectssuch as E[r(dlowast Zj Uj bj ) minus r(d Zj Uj bj )] It is clear that structural non-causality implies zero average causal effects while the converse is not necessarilytrue

To describe identified effects in samples from heterogeneous populationswe define population groups Jg g = 1 as collections of populationunits having identical bj and identical distributions of (Dj Uj ) | X j whereX j = (Z prime

j W primej )

prime We define G = 1 As the population is finite so is thenumber of groups le N We define Ng = j isinJg where middot is the cardinal-ity of the indicated set and let pg = NgN be the proportion of population unitsbelonging to group Jg The bj rsquos by themselves need not define groups as thedistributions of (Dj Uj ) | X j may differ for units with identical bj We call cross-group variation in the distributions of (Dj Uj ) | X j distributional heterogeneityto distinguish this from structural heterogeneity that refers to cross-group varia-tion in bj

Note that because groups are defined by unobservable constants bj and dis-tributions involving the unobservable Uj in general we do not know for surewhether two units belong to the same group Interestingly in the related litera-ture Bester and Hansen (2016) consider estimating grouped effects in panel datamodels when each individualrsquos group identity is known and Su Shi and Phillips(2014) consider identifying latent group structures in panel data models via a vari-ant of Lasso Nevertheless both groups of researchers have only focused on thestructural heterogeneity in the panel framework

Given A1 it follows that (Yj Dj ) | X j is identically distributed for all unitsj in group Jg As a convenient shorthand for j isin Jg we write (Yj Dj ) | X j sim(Yg Dg) | Xg where A sim B denotes that A is distributed as B

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 273

Typically we do not observe an entire population Instead we observe observa-tions sampled from the population in some way For simplicity and concretenesswe consider simple random sampling At the sample level we write4

Y = r (D Z U B)

where B is the randomly sampled bj Note that the randomness of B arises solelyfrom the sampling process We write X = (Z primeW prime)prime and distinguish X from thecauses D by calling X a vector of covariates

For any random vector X we let f (x) and F (x) denote the joint prob-ability density function (PDF) and cumulative distribution function (CDF)respectively For any two random vectors X and Y we use f (y|x) and F (y|x)to denote the conditional PDF and CDF of Y given X = x respectively Wealso use subscript j and g to denote individual j and group g respectivelyFor example Fg(ud x) and fg (u|d x) denotes the CDF of (Ug Dg Xg)and the conditional PDF of Ug given Dg Xg for group g respectively Notethat Fg(ud x) = Nminus1

g

sumjisinJg Fj (ud x) which defines the mixture CDF of

(Ug Dg Xg)

32 Gminuscausality a first encounterWe now define Gminuscausality for cross-section data based on Grangerrsquos philoso-phy of nonpredictability Holland (1986 p 957) considers a special case whereD is randomized and applies Gminuscausality to cross-section data5 As in Hollandwe define Gminusnon-causality as a conditional independent statement FollowingDawid (1979) we write X perp Y | Z to denote that X and Y are independentgiven Z and X perp Y | Z if X and Y are not independent given Z Translat-ing Granger and Newboldrsquos (1986 p 221) definition to the cross-section contextgives

DEFINITION 33 (GminusCausality) Let Y Q and S be random vectors IfQ perp Y | S then Q does not Gminuscause Y with respect to (wrt) S OtherwiseQ Gminuscauses Y wrt S

Here Y Q and S can be any random vectors and this notion has no structuralcontent When S is a constant we have the simplest form of Gminusnon-causalityindependence Correlation is an example of Gminuscausality

To give Gminuscausality structural meaning we impose A1 assume that D obeysa suitable exogeneity condition and take Q = D and S = X The exogeneityassumed for D ensures identification for various measures of its effects and canoften be structurally justified It also turns out to be ideally suited to endowingGminuscausality with structural meaning

To see how this works in a simple setting consider the homogeneous case withD being a binary scalar in which average treatment effects (AT E) and averagetreatment effects on the treated (AT T ) are often discussed (see eg Rubin 1974

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

274 XUN LU ET AL

Hahn 1998 Hirano Imbens and Ridder 2003 and Angrist and Pischke 2009)Define the covariate-conditioned effect of treatment on the treated as

AT TX = E (Y (1)minusY (0) | D = 1 X)

where Y (1) is the potential response to treatment and Y (0) is the potentialresponse in the absence of treatment Using A1 we write Y (1) = r(1 Z U b0)and Y (0) = r(0 Z U b0) AT TX thus has a clear structural interpretation Fromthis we can construct AT T

AT T = E (Y (1)minusY (0) | D = 1) = E(AT TX | D = 1)

Note that here we define AT T and AT TX based on our structural equation r Nevertheless they can be defined without a structural model ie based on thetwo potential outcomes Y (0) and Y (1) (see eg Rubin 1974 and Angrist andPischke 2009)

To identify AT TX it suffices that D perp Y (0) | X Given A1 it suffices forthis that D perp U | X as is readily verified This conditional exogeneity is acommon identifying assumption in cross-sections6 Classical strict exogeneity((D Z) perp U ) is sufficient but not necessary for this Using D perp U | X in thesecond line below we have

AT TX = E [r(1 Z U b0) | D = 1 X ]minus E [r(0 Z U b0) | D = 1 X ]

= E [r(1 Z U b0) | D = 1 X ]minus E [r(0 Z U b0) | D = 0 X ]

= E(Y | D = 1 X)minus E(Y | D = 0 X) = μ(1 X)minusμ(0 X)

Thus AT TX can be expressed in terms of the distribution of observables (hereμ(1 X) and μ(0 X)) so this effect is identified (eg Hurwicz 1950) Simi-larly AT T is identified The identification result based on conditional exogeneityhas been extensively discussed in the treatment effect literature (see eg Rubin1974 Holland 1986 and Angrist and Pischke 2009)

To see the structural meaning for Gminuscausality observe that E(Y | D = 1 X)minusE(Y | D = 0 X) = 0 as is another way to write E(Y | D X) = E(Y | X) asThus in this context Gminusnon-causality in mean of D for Y wrt X is equivalentto AT TX = 0 as

Note that even for binary treatments Gminuscausality in mean does not tell thefull story as Gminuscausality in distribution can hold even when AT TX or AT Tvanishes For example suppose

Y = U D

with D perp U E(U ) = 0 E(U 2) lt infin and E(D2) lt infin Then D structurallycauses Y its effect on Y is U a random effect These effects may be posi-tive or negative however they average out as AT T = 07 Here and as we willshow generally Gminusnon-causality essentially has the interpretation that Y is not

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 275

structurally caused by D under the key conditional exogeneity assumption Inthis example we have Gminuscausality (D perp Y ) as E(Y 2 | D) = D2 E(U 2) TestingGminusnon-causality will detect the structural causality here in contrast to testingAT T = 0 or equivalently Gminusnon-causality in mean of D for Y

When D is nonbinary A1 together with suitable exogeneity conditionssimilarly suffices to identify certain effects that give structural meaning toGminuscausality This makes it generally possible to test for structural causality bytesting for Gminuscausality

With heterogeneity matters become somewhat more complicated In particularin the binary case μ(1 X)minusμ(0 X) is no longer AT TX Instead with suitableexogeneity (Assumption A2 below) we have that for each x μ(1 x)minusμ(0 x)recovers a blended effect

μ(1 x)minusμ(0 x) =sumg=1

ζg|x AT Tg|x

where the ζg|x rsquos are nonnegative weights adding to one and for all j isin JgAT Tg|x = E[Yj (1) minus Yj (0) | Dj = 1 X j = x] is the group-specific covariate-conditioned average effect of treatment on the treated Here Yj (1) is the poten-tial response to treatment and Yj (0) is the potential response in the absence oftreatment

33 Exogeneity and effect identification with heterogeneity

With heterogeneity an identifying exogeneity condition relevant for analyzingGminuscausality is

Assumption A2 (Heterogeneous Cross-Section Exogeneity) For all g isin G(i) Dg | Xg sim D | X and (i i) Dg perp Ug | Xg

Observe that this imposes structures for all groups namely all g isin G Althoughthis condition is not the weakest possible (see Chalak and White (2012) and thediscussion below) A2 generally suffices to identify effects and link Gminuscausalityand structural causality A2(i) restricts the allowed distributional heterogeneityfor all g isin G the conditional distributions of Dg given Xg are assumed identicalA2(i i) imposes conditional exogeneity for all groups Below we discuss thisfurther

THEOREM 31 Suppose A1 - A2(i) hold If A2(i i) holds then D perp U | X

It is also of interest to know whether the converse holds A strict conversedoes not hold as certain fortuitous cancellations in population-group condi-tional distributions can yield sample conditional exogeneity without population-group conditional exogeneity Nevertheless the converse does hold under a mildregularity condition ruling out exceptional cases We introduce the followingdefinition

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

276 XUN LU ET AL

DEFINITION 34 Ug | Dg Xg g isin G is regular ifsumgisinG

pg fg(u | d x) fg(x) =sumgisinG

pg fg(u | x) fg(x) for all (ud x) (32)

implies Ug perp Dg | Xg for all g isin GWhen G = 1 Ug | Dg Xg g isin G is necessarily regular To understand whatregularity rules out consider the next simplest case with G = 2 and Xg absentThen Ug | Dg g isin G fails to be regular if and only if U2 perp D2 (say) and

f1(u | d) = f1(u)minus 1minus p1

p1[ f2(u | d)minus f2(u)] for all (ud)

where the two groups have been re-indexed for convenience Suppose that p1f1(u) f2(u) and f2(u | d) are arbitrary Then it is easily arranged that f1(u | d)can be negative for (ud) in a set of positive probability so f1(u | d) does notdefine a conditional density and Ug | Dg g isin G is regular after all If this f1(u |d) is nevertheless a conditional density it is clearly highly special as it ensuresthat eq(32) holds for the given p1 and the functions defined by f1(u) f2(u) andf2(u | d) (all of which are typically unknown) but not necessarily otherwise

In the general case Ug | Dg Xg g isin G fails to be regular if and only ifUg perp Dg | Xg for some g isin G and

f1(u | d x) = f1(u | x)minus (p1 f1(x))minus1sumg=2

pg [ fg(u | d x) minus fg(u | x)] fg(x)

times for all (ud x)

where we again re-index the groups As before this is clearly a very special pop-ulation configuration ruling out such cases by imposing regularity is a weak re-striction

The converse result is

THEOREM 32 Suppose A1 - A2(i) hold and suppose Ug | Dg Xgg isin Gis regular If D perp U | X then A2(i i) holds

Theorems 31-32 also hold with Yg | Dg Xg g isinG regular D perp U | X replacedby D perp Y | X and A2(i i) replaced by Dg perp Yg | Xg g isin G This result plays acentral role in linking structural causality and Gminuscausality

34 Linking structural causality and GminuscausalityAs we have seen A1 imposes structures where causal effects are well definedA2 permits recovering versions of these As we now prove this gives structuralmeaning to Gminuscausality generally Recall that the bj rsquos are identical in group Jgthus the same structural causality relations hold for all j in a given group Wewrite Dg rArrS Yg when Dj rArrS Yj for (all) j in group Jg Our next result showsthat structural noncausality implies Gminusnon-causality

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 277

PROPOSITION 33 Let A1 - A2 hold Then structural noncausality(Dg rArrS Yg forallg isinG) implies that D does not Gminuscause Y wrt X ie D perp Y | X

This result is intuitive when structural effects are well defined and identifiableGminusnon-causality follows from structural noncausality Holland (1986 p 958)gives a similar result for the random experiment case Our Proposition 33 jus-tifies using tests of Gminusnoncausality to test structural noncausality if we rejectGminusnoncausality we must reject structural noncausality

As WL show structural causality does not imply Gminuscausality Without furtherassumptions the concepts are not equivalent Nevertheless if as is commonlyassumed in the literature (see WL for discussion) r obeys separability betweenobservable causes D and unobservable causes U or r obeys a specific form ofmonotonicity in U then with suitable regularity structural causality does implyGminuscausality We have

THEOREM 34 Given A1 - A2 suppose Yg | Dg Xgg isin G is regularSuppose further that for all g isin G either (i) or (i i) holds

(i) For all j isin Jg for unknown measurable functions r1 and r2

Yj = r1(Dj Zj bj

)+ r2(Zj Uj bj

) (33)

(ii) For all j isinJg for = 1 ky r(d zu bj ) = r0(d zu bj ) for scalaru where r0(d z middot bj ) is strictly monotone increasing for each admissi-ble (d z) and Fj(y | d x) = P[Yj le y | Dj = d X j = x] is strictlymonotone in y for each admissible (d x)

Then structural causality (for some g isin G Dj rArrS Yj for (all) j in Jg) impliesthat D Gminuscauses Y wrt X ie D perp Y | X

Kasy (2011) gives a discussion of structures satisfying (i i) Even without sep-arability or monotonicity there is an equivalence between Gminuscausality and astronger notion of structural causality that handles certain exceptional cases wherethe causal structure and the conditional distribution of (Uj Dj ) given X j inter-act in just the right way to hide the structural causality WL call this strongernotion structural causality with positive probability and provide discussions Welet Yj = supp(Yj ) and Xj = supp(X j )

DEFINITION 35 (Structural Causality with Positive Probability) SupposeA1 holds and let j be given Suppose that for each y isin Yj there exists a mea-surable function f jy Xj rarr [01] such thatint

1r(Dj Zj u bj

)lt y

d Fj

(u | X j

) = f jy(X j

)as (34)

where Fj (u | X j ) = E[1Uj le u | X j ] Then Dj does not structurally cause Yj

almost surely (as) wrt X j (Dj rArrS(X j ) Yj ) Otherwise Dj structurally causesYj with positive probability (wpp) wrt X j

(Dj rArrS(X j ) Yj

)

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

278 XUN LU ET AL

By the definition of a group the same structural causality wpp relations holdfor all j in a given group We thus write Dg rArrS(Xg) Yg when Dj rArrS(X j ) Yj holdsfor all j in group Jg

Below is an example in WL in which Dj structurally causes Yj while Dj doesnot structurally cause Yj as Consider the structural equation r

Yj = r(Dj Uj

) = DjradicD2

j +1Uj1 + 1radic

D2j +1

Uj2

where Uj equiv (Uj1Uj2) and Dj are all N (01) random variables and Dj U1 j andU2 j are mutually independent For simplicity there is no bj Zj or X j It is clearthat Dj structurally causes Yj here Nevertheless Dj does not structurally causeYj as To see this note that Yj and Dj are independent Thus the LHS of eq(34)becomesint

1r(Dj u1u2

)lt y

d Fj (u1u2)

=int

1r(Dj u1u2

)lt y

d Fj

(u1u2|Dj

) = Pr[Yj lt y|Dj

] = (y)

where Fj (middot) and Fj(middot|Dj

)denote the CDF of Uj and the conditional CDF of Uj

given Dj and is the standard normal CDF Thus the RHS does not depend onDj ie Dj does not structurally cause Yj as

Theorem 35 below gives a result linking Gminuscausality and structural causalityas

THEOREM 35 Suppose A1 - A2 hold Suppose also that Yg | Dg Xgg isinG is regular If Dj structurally causes Yj wpp wrt X j (for some g isin GDj rArrS(X j ) Yj for all j in Jg) then D Gminuscauses Y wrt X ie D perp Y | X

Together Proposition 33 and Theorem 35 establish that cross-sectionGminuscausality and structural causality are essentially equivalent under the key con-ditional exogeneity assumption We say ldquoessentiallyrdquo as Theorem 35 rules outthe two ways that Gminusnon-causality can mask structural causality The first arisesfrom subtle interactions between the causal structure and the conditional distribu-tion of (Uj Dj ) given X j The second arises under heterogeneity from delicatecancellations among the conditional distributions of Yj given Dj and X j acrossgroups These exceptional possibilities only trivially mitigate the power of testsfor Gminuscausality as tests for structural causality Errors of interpretation and in-ference need not result as long as these exceptions are recognized

35 Identification and identification failure

To gain further insight into the relation between Gminuscausality and effect identi-fication we now undertake a deeper analysis of Gminuscausality in mean For this

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 279

let qg(d x) = fg(d x)sumhisinG (ph fh(d x)) Under A1 the law of iterated ex-

pectations gives

E (Y | D = d X = x) =sumgisinG

pg qg(d x)

intr(d zu bg) d Fg(u | d x)

For concreteness consider identifying average marginal effects for a continu-ous treatment d To allow both discrete and continuous Ug let νg(u | d x) definea σminusfinite measure dominating that defined by Fg(u | d x) and let fg(u | d x)define the associated Radon-Nikodym density for Ug given (Dg Xg) Assum-ing differentiability and mild regularity derivations parallel to those of White andChalak (2013) give

part

partdE(Y | D = d X = x)

=sumgisinG

pg qg(d x)

intpart

partdr(d zu bg) fg(u | d x) dνg(u | d x)

+sumgisinG

pg qg(d x)

int[r(d zu bg)

part

partdln fg(u | d x)]

times fg(u | d x) dνg(u | d x)

+sumgisinG

pgpart

partdqg(d x)

intr(d zu bg) fg(u | d x)dνg(u | d x)

After some manipulation this regression derivative can be written as

part

partdE (Y | D = d X = x) = β(d x)+ δ1(d x)+ δ2(d x) (35)

The first of the three terms on the right is a weighted average marginal effect

β(d x) =sumgisinG

pg qg(d x)βg(d x) where βg(d x)

= E[(partpartd)r(Dg ZgUg bg) | Dg = d Xg = x

]is the covariate-conditioned average marginal effect of Dj on Yj at (Dj = d X j =x) for j isin Jg g isin G Thus when the other two terms vanish the regressionderivative identifies the structurally meaningful weighted effect β(d x) andwhen β(d x) is nonzero for (d x) in a set of positive probability we haveGminuscausality in mean of D for Y wrt X When the other two terms do notvanish we generally have identification failure and there is no necessary linkbetween Gminuscausality and structural causality

The second term is a pure endogeneity bias

δ1(d x) =sumgisinG

pg qg(d x) E(εg Sg | Dg = d Xg = x

)

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

280 XUN LU ET AL

where εg = Ygminus E(Yg | Dg = d Xg = x

)is the regression residual for the given

group and Sg = (partpartd) ln fg(Ug | Dg Xg) is the exogeneity score of White andChalak (2013) Observe that E(εg Sg | Dg = d Xg = x) represents a specificform of omitted variable bias where the exogeneity score is the omitted variableA sufficient (but not necessary) condition for this bias to vanish is A2(i i) Dg perpUg | Xg for all g isin G

The third term is a pure heterogeneity bias

δ2(d x) =sumgisinG

pgpart

partdqg(d x) E

(Yg | Dg = d Xg = x

)

due to heterogeneity of fg(d | x) across members of G8 A sufficient (but notnecessary) condition for this to vanish is A2(i) Dg | Xg sim D | X for all g isin G

Thus A2 ensures that the regression derivative identifies the weighted effect

β(d x) = βlowast(d x) =sumgisinG

pg qlowastg(x)βlowast

g(d x)

where qlowastg(x) = fg(x)

sumh (ph fh(x)) and βlowast

g(d x) = int(partpartd)r(d zu bg)

d Fg(u | x) This identification provides the link between Gminuscausality in meanand structural causality

Although A2 is not necessary for effect identification cases where identifica-tion holds in the absence of A2 are quite special analogous to failures of reg-ularity Thus for practical purposes A2 can be viewed as playing the key rolein identifying effects of interest and thereby linking Gminuscausality and structuralcausality

Further as Proposition 61 and Corollary 62 of WL show in the absence ofstructural causality Gminuscausality is essentially equivalent to exogeneity failureHere this is reflected in the fact that when βg(d x) vanishes for all g A1 andmild regularity conditions give

part

partdE (Y | D = d X = x) = δ1(d x)+ δ2(d x)

Thus with structural noncausality as Gminuscausality in mean implies the fail-ure of A2(i) A2(i i) or both conversely failure of A2 essentially ensuresGminuscausality in mean

36 Testing for Gminuscausality and structural causality in cross-sectiondata

As just seen the hypothesis of Gminusnon-causality and thus of structural noncausal-ity under the conditional exogeneity assumption is a specific conditional inde-pendence In the literature there are many conditional independence tests thatapply to IID data (eg see Delgado and Gonzalez-Manteiga 2001 Fernandes and

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 281

Flores 2001 Su and White 2007 2008 2014 Song 2009 Linton and Gozalo2014 Huang Sun and White 2016) These methods can be computationallychallenging as they are nonparametric Angrist and Kuersteiner (2004 2011)develop a semi-parametric test for conditional independence Based on the factthat Y perp D | X implies ψ1(Y ) perp ψ2(D) | X for any measurable vector functions(ψ1ψ2) WL propose several convenient regression-based tests

4 STRUCTURAL CAUSALITY AND GminusCAUSALITY IN PANEL DATAThe literature contains considerable discussion about Gminuscausality in the paneldata setting Nevertheless the focus is mainly on linear or parametric modelsFor example Chamberlain (1984) discusses Gminuscausality ldquoconditional on un-observables [lsquofixed effectrsquo]rdquo in a linear model and a logit model Holtz-EakinNewey and Rosen (1988) consider a traditional linear vector autoregression(VAR) and use GMM to test Gminuscausality Nair-Reichert and Weinhold (2001)discuss Gminuscausality in a dynamic mixed fixed- and random-coefficients linearmodel allowing heterogeneity across individuals Dumitrescu and Hurlin (2012)test for Gminuscausality in a linear dynamic panel model with fixed coefficients thatvary across individuals There is also a growing literature on nonlinear and non-separable panel models (see eg Hoderlein and White (2012) and the referencestherein) We are not aware that Gminuscausality has been discussed in such modelsso far

We emphasize that in panel data Gminuscausality is also entirely based on pre-dictability and has no causal interpretation In this section we link Gminuscausalityand structural causality in general panel structures Since the static panel case issimilar to the cross-sectional case we focus on a dynamic data generating processTo simplify notation for given y isin N+ we let Y jtminus1 denote the finite historyY jtminus1 = (Yjtminusy Yjtminus1)

Assumption B1 (Panel Population DGP) Let (F P) be a complete prob-ability space let N isin N+ and let y isin N+ For the members of a populationj isin 1 N let the random vectors Yjt t = 12 be structurally determinedby a triangular system as

Yjt = r(Y jtminus1 Djt Zjt Ujt bjt

) (41)

where r is an unknown measurable ky times 1 function ky isin N+ Yjτ (τ = 1 minusy 0) Djt (kd times1kd isin N+) Zjt (kz times1 kz isin N) and Ujt (ku times1 ku isin N)are vectors of nondegenerate random variables on (F P) and bjt is a non-random real vector of dimension kb isin N Suppose also that Wjt t = 12 arerandom vectors on (F P) with dimension kw isin N The triangular structureis such that for t = 12 neither Yjsinfins=1minusy

nor Djsinfins=1 structurally deter-mine Wjt Yjsinfins=1minusy

does not structurally determine Djt Zjt or Ujt andDjsinfins=1 does not structurally determine Zjt or Ujt

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

282 XUN LU ET AL

The interpretation of structural eq(41) is the same as that for eq(31) ex-cept that here we have multiple time periods t = 12 The elements of(Yjt Djt Zjt Wjt Ujt

)and bjt rsquos can contain both time-varying and time-

invariant elements For now we do not distinguish these The elements of(Djt Zjt Wjt Ujt ) can contain lags of underlying variables By permittingonly finite histories Y jtminus1 Djt Zjt we restrict attention to Markov-type datagenerating processes this greatly simplifies the analysis and corresponds to thestructures usually considered in practice

As in the cross-section case we can have both structural heterogeneity anddistributional heterogeneity We specify the relevant structural and distributionalheterogeneities below

We again impose random sampling from the population For simplicity weassume all time periods can be observed for every individual so we observe abalanced panel Similar to the cross-section case we write

Yt = r (Y tminus1 Dt Zt Ut Bt ) t = 1 T

by suppressing the cross-sectional subscript i We assume that we observe data(Yt Dt Zt Wt ) t = 1 T and relevant lags prior to t = 1

41 Linking structural causality and Gminuscausality in dynamic panelsWe first define structural causality for the dynamic panel structure Let Djt be thesupport of Djt

DEFINITION 41 (Structural Causality in Dynamic Panels) Let j and t begiven If the function r(y jtminus1 middot zjt ujt bjt ) Djt rarr R

ky is constant on Djt

for all admissible y jtminus1 zjt and ujt then Djt does not structurally cause Yjt and we write Djt rArrS Yjt Otherwise Djt structurally causes Yjt and we writeDjt rArrS Yjt

We let X jt = (Z primejt W prime

jt )prime denote the covariates at time t WL consider

retrospective conditional exogeneity and retrospective Gminuscausality (White andKennedy 2009) We also allow this For this we letX jt denote a covariate his-tory that may contain X jt and lags or leads of X jt as in Wooldridge (2005 p 41)or WL ie for some m isin N+

X jt = SX(X jtminusm X jt X jt+m

) t = 12

where SX is a given selection matrix We assume

X jt X jtminusm subeX jt and

allow X jt also to contain leads of X jt When Xjt contains leads we have theretrospective case as in WL We denote randomly sampled covariates X t Forsimplicity we also assume that all needed elements of X t are observable Tem-poral precedence plays an important role in panel data In particular comparedwith the cross-sectional case we now have a richer set of covariates (includingboth leads and lags) to choose from

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 283

To provide a link between structural causality and Gminuscausality for dynamicpanels we condition on not only covariates X t but also lags of Yt For thiswe denote Y jtminus1 = (Yjtminusmy Yjtminus1) where my isin N+ and my ge y ie

Y jtminus1 sube Y jtminus1 Similarly we let Y tminus1 be the randomly sampled Y jtminus1Similar to the cross-section case for each t we define population group Jgt

g isin 1 t as a collection of population units having identical bjt and iden-tical distribution of (Ujt Djt ) |X jt Y jtminus1 We also let Gt = 1 t Forgiven t we write Dgt rArrS Ygt to denote Djt rArrS Yjt for all j in group Jgt g isin Gt

Next we impose an exogeneity assumption analogous to A2

Assumption B2 (Heterogeneous Dynamic Panel Exogeneity) Let t be givenFor all g isin Gt (i) Dgt |Xgt Y gtminus1 sim Dt |X t Y tminus1 and (i i) Dgt perp Ugt |Xgt Y gtminus1

Here as before for all g isin Gt the conditional distributions of Dgt given(Xgt Y gtminus1) are assumed to be identical and we write the common distributionas Dt |X t Y tminus1

The analog of Theorem 31 relating population and sample conditional exo-geneity is

THEOREM 41 Suppose B1ndashB2(i) hold and let t be given If B2(i i) holdsthen Dt perp Ut |X t Y tminus1

Proposition 42 provides a link between structural noncausality and Gminusnon-causality

PROPOSITION 42 Suppose that B1 and B2 hold Let t be given Thenstructural noncausality (Dgt rArrS Ygt forallg isin Gt ) implies that Dt does notGminuscause Yt wrtX t Y tminus1 ie Dt perp Yt |X t Y tminus1

We can also define structural causality with positive probability for dynamicstructures and show that structural causality and Gminuscausality are ldquoessentiallyrdquoequivalent under the key conditional exogeneity assumption For brevity we omitthe details

So far we have not carefully distinguished between time-varying and time-invariant elements of unobservable bjt and Ujt Below we briefly discuss thecase where we allow time-invariant components of Ujt and the randomly sampledbjt to be arbitrarily dependent or correlated with the cause of interest Djt For this we denote Ujt equiv (Ujt Uj0) where Ujt is time varying and Uj0time-invariant Similarly we denote bjt equiv (bjt bj0) The randomly sampledUjt Uj0 bjt bj0 are denoted as Ut U0 Bt B0 respectively In princi-ple we want to remove the time-invariant Uj0 and bj0 using eg the firstdifference For this it is convenient to impose a separable assumption on thestructural equation9 We also impose a conditional exogeneity assumption at thesample level

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

284 XUN LU ET AL

Assumption B3 Suppose that B1 holds with

Yjt = r1(Djt Uj0 bj0

)+r2(Y jtminus1 Djt Zjt Ujt bjt

) j = 1 N t = 1

where r1 and r2 are two unknown ky times1 measurable functions

Assumption B4 Suppose that for given t Dt perp (Ut Utminus1 Bt Btminus1

) |Xt Y tminus1

B4 is a sample-level assumption which can be supported by a correspond-ing assumption at the heterogeneous population level Note that in general B4is a strong assumption as we condition on Y tminus1 which is a function of bothDtminus1 and Utminus1 Nevertheless this assumption is plausible under the null of struc-tural noncausality as Y jtminus1 is a constant function of Djtminus1 under the nullOne simple example is that bjt rsquos are constants over j and t and Djt infint=1 perpYjt 0

t=1minusyUjt infint=1X jt infint=1 Under the null of structural noncausality for

all t Y jtminus1 is a function of Yjs0s=1minusy

Ujsts=1X jst

s=1 Then it is easyto show that in this case B4 is satisfied Certainly it will be interesting to relaxB4 but we leave this for future research

PROPOSITION 43 Suppose that B1ndashB4 hold Suppose that Dgt rArrS Ygtand Dgtminus1 rArrS Ygtminus1 forallg isin Gt cupGtminus1 Then Dt perp Yt | Xt Y tminus1

Proposition 43 suggests that when there are time-invariant components in theunobservables we can test for structural noncausality by testing for Gminusnon-causality Dt perp Yt | Xt Y tminus1 under the conditional exogeneity assumption B4

42 Testing for Gminuscausality and structural causality in panel dataAs shown above we can perform a Gminusnon causality test to test for structuralnoncausality Testing Gminusnon-causality in panel data is also simply a conditionalindependence test Here we focus on testing Dt perp Yt | Xt Y tminus1

First assume that the joint distributions of (Dt Yt Xt ) are identical over t In this case we can pool the data and implement a conditional independence testas discussed in Section 36

Second suppose that the joint distributions of (Dt Yt Xt ) are different overtime t In this case for each time period t using the cross-section data we canimplement a conditional independence test for Dt perp Yt | (Xt Y tminus1) and obtain atest statistic sayWt Thus we have T test statistics W1W2 WT and eachone can be used to test for structural noncausality for each t We may also want totest the hypothesis that for all t Djt does not structurally cause Yjt For this wecan construct our test statistic by taking the average of the T test statistics ieW equiv 1

T

sumTt=1Wt as in Im Pesaran and Shin (2003) and Dumitrescu and Hurlin

(2012) Other forms of ldquoaveragesrdquo are possible and special care is needed to takeinto account the dependence structure over time

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 285

5 CONCLUSION

This paper provides direct links between Granger causality and structural causal-ity in cross-section and panel data We extend Granger causality to cross-section and panel data and give a natural definition of structural causality inheterogeneous populations We show that under the key conditional exogeneityassumption Granger causality is essentially equivalent to structural causality incross-section and panel data Similar to the results in White and Lu (2010) fortime-series data our results here should enable researchers to avoid the misuse ofGranger causality and to establish the desired structural causal relation

NOTES

1 Strictly speaking if the conditional distributions are parameterized or treated as infinite-dimensional parameters then distributional heterogeneity may also be thought of as a special caseof structural heterogeneity

2 The classification is certainly overly simplistic We ignore the early literature on the philosophyof causality (for a review see eg Holland 1986 and Hoover 2008) We also ignore the litera-ture on causality discussed in other disciplines notably in machine learning (see eg Pearl 20092015 White and Chalak 2009 and White Chalak and Lu 2011) In particular Heckman and Pinto(2015) compare Haavelmorsquos structural framework of causality with the framework of Directed AcyclicGraphs (DAG) studied in the Bayesian network

3 Although Uj is typically called ldquounobservablerdquo it is better to view its elements as variables thatwill be omitted from empirical analysis This may be because they are unobservable it may also bebecause the researcher has purposefully or inadvertantly neglected them

4 For notational simplicity we suppress the observation subscript i5 Using our notation Holland considers the case where D perp (XU ) which implies our condi-

tional exogeneity assumption D perp U | X below6 See eg Altonji and Matzkin (2005) Hoderlein and Mammen (2007 2009) Imbens and Newey

(2009) White and Lu (2011) and White and Chalak (2013)7 AT T = E(Y (1)minusY (0) | D = 1) = E(U minus0 | D = 1) = E(U ) = 0

8 The conditional density fg(d | x) can also be interpreted as Imbensrsquos (2000) generalized propen-sity score

9 There is a recent literature which allows time-invariant unobservables to be arbitrarily correlatedwith causes of interest in the general nonseparable models (see eg Evdokimov 2010) We leave theproblem of linking G-causality and structural causality in this general setting for future research

REFERENCES

Altonji J amp R Matzkin (2005) Cross section and panel data estimators for nonseparable models withendogenous regressors Econometrica 73 1053ndash1102

Angrist J K Graddy amp G Imbens (2000) The interpretation of instrumental variables estimatorsin simultaneous equations models with an application to the demand for fish Review of EconomicStudies 67 499ndash527

Angrist J O Jorda amp G Kuersteiner (2013) Semiparametric estimates of monetary policy effectsstring theory revisited NBER Working Paper 19355

Angrist J amp G Kuersteiner (2004) Semiparametric causality tests using the policy propensity scoreNBER Working Paper No 10975

Angrist J amp G Kuersteiner (2011) Causal effects of monetary shocks semiparametric conditionalindependence tests with a multinomial propensity score Review of Economics and Statistics 93725ndash747

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

286 XUN LU ET AL

Angrist J amp J Pischke (2009) Mostly Harmless Econometrics An Empiricistrsquos Companion PrincetonUniversity Press

Bester CA amp CB Hansen (2016) Grouped effects estimators in fixed effects models Journal ofEconometrics 190 197ndash208

Bonhomme S amp E Manresa (2015) Grouped patterns of heterogeneity in panel data Econometrica83 1147ndash1184

Browning M amp JM Carro (2007) Heterogeneity and microeconometrics modelling In R BlundellW K Newey and T Persson (Eds) Advances in Economics and Econometrics Theory andApplications Ninth World Congress of the Econometric Society Volume 3 pp 45ndash74 CambridgeUniversity Press

Browning M amp JM Carro (2010) Heterogeneity in dynamic discrete choice models EconometricsJournal 13 1ndash39

Burda M M Harding amp J Hausman (2015) A Bayesian semi-parametric competing risk model withunobserved heterogeneity Journal of Applied Econometrics 30 353ndash376

Chalak K amp H White (2011) An extended class of instrumental variables for the estimation of causaleffects Canadian Journal of Economics 44 1ndash51

Chalak K amp H White (2012) Causality conditional independence and graphical separation insettable systems Neural Computation 24 1611ndash1668

Chamberlain G (1982) The general equivalence of Granger and Sims causality Econometrica 50569ndash581

Chamberlain G (1984) Panel data In Z Griliches amp M Intriligator (eds) Handbook of Economet-rics vol2 ch 22 pp1247ndash1318 Elsevier

Chesher A (2003) Identification in nonseparable models Econometrica 71 1405ndash1441Chesher A (2005) Nonparametric identification under discrete variation Econometrica 73

1525ndash1550Dawid AP (1979) Conditional independence in statistical theory Journal of the Royal Statistical

Society Series B 41 1ndash31Deb P amp PK Trivedi (2013) Finite mixture for panels with fixed effects Journal of Econometric

Methods 2 35ndash51Delgado M amp W Gonzalez-Manteiga (2001) Significance testing in nonparametric regression based

on the bootstrap Annals of Statistics 29 1469ndash1507Dumitrescu E-I amp C Hurlin (2012) Testing for Granger non-causality in heterogeneous panels

Economic Modelling 29 1450ndash1460Evdokimov K (2010) Identification and estimation of a nonparametric panel data model with unob-

served heterogeneity Working paper Department of Economics Princeton UniversityFlorens JP (2003) Some technical issues in defining causality Journal of Econometrics 112

127ndash128Florens JP amp D Fougere (1996) Non-causality in continuous time Econometrica 64 1195ndash1212Florens JP amp M Mouchart (1982) A note on non-causality Econometrica 50 583ndash591Granger CWJ (1969) Investigating causal relations by econometric models and cross-spectral

methods Econometrica 37 424ndash438Granger CWJ (1980) Testing for causality a personal viewpoint Journal of Economic Dynamics

and Control 2 329ndash352Granger CWJ (1988) Some recent developments in a concept of causality Journal of Econometrics

39 199ndash211Granger CWJ amp P Newbold (1986) Forecasting Economic Time Series Academic Press 2nd

editionHausman J amp T Woutersen (2014) Estimating a semi-parametric duration model without specifying

heterogeneity Journal of Econometrics 178 114ndash131Haavelmo T (1943) The statistical implications of a system of simultaneous equations Econometrica

11 1ndash12Haavelmo T (1944) The probability approach in econometrics Econometrica 12 (supp) iii-vi

1ndash115

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 287

Hahn J (1998) On the role of the propensity score in efficient semiparametric estimation of averagetreatment effect Econometrica 66 315ndash331

Hahn J amp R Moon (2010) Panel data models with finite number of multiple equilibria EconometricTheory 26 863ndash881

Heckman JJ (2000) Causal parameters and policy analysis in economics a twentieth century retro-spective Quarterly Journal of Economics 115 45ndash97

Heckman JJ (2008) Econometric causality International Statistical Review 76 1ndash27Heckman JJ amp R Pinto (2015) Causal analysis after Haavelmo Econometric Theory 31

115ndash151Hirano K G Imbens amp G Ridder (2003) Efficient estimation of average treatment effects using the

estimated propensity score Econometrica 71 1161ndash1189Hoderlein S amp E Mammen (2007) Identification of marginal effects in nonseparable models without

monotonicity Econometrica 75 1513ndash1519Hoderlein S amp E Mammen (2009) Identification and estimation of local average derivatives in non-

separable models without monotonicity Econometrics Journal 12 1ndash25Hoderlein S amp H White (2012) Nonparametric identification in nonseparable panel data models with

generalized fixed effects Journal of Econometrics 168 300ndash314Holland P (1986) Statistics and causal inference Journal of the American Statistical Association 81

945ndash970Holtz-Eakin D W Newey amp HS Rosen (1988) Estimating vector autoregressions with panel data

Econometrica 56 1371ndash1396Hoover K (2008) Causality in economics and econometrics In S Durlauf and L Blume (eds)

The New Palgrave Dictionary of Economics 2nd edition Palgrave MacmillanHsiao C (2014) Analysis of Panel Data Cambridge University PressHuang M Y Sun amp H White (2016) A flexible nonparametric test for conditional independence

Econometric Theory forthcomingHurwicz L (1950) Generalization of the concept of identification In TC Koopmans (eds) Statistical

Inference in Dynamic Economic Models pp 238ndash257 John WileyIm K H Pesaran amp Y Shin (2003) Testing for unit roots in heterogeneous panels Journal of Econo-

metrics 115 53ndash74Imbens G (2000) The role of the propensity score in estimating dose-response functions Biometrika

87 706ndash710Imbens G (2004) Nonparametric estimation of average treatment effects under exogeneity a review

Review of Economics and Statistics 86 4ndash29Imbens G amp W Newey (2009) Identification and estimation of triangular simultaneous equations

models without additivity Econometrica 77 1481ndash1512Imbens G amp J Wooldridge (2009) Recent developments in the econometrics of program evaluation

Journal of Economic Literature 47 5ndash86Kasy M (2011) Identification in triangular systems using control functions Econometric Theory 27

663ndash671Koopmans T (1950) Statistical Inference in Dynamic Economic Models Cowles Commission Mono-

graph No 10 WileyKuersteiner G (2008) GrangerndashSims causality In S Durlauf and L Blume (Eds) The New Palgrave

Dictionary of Economics 2nd edition Palgrave MacmillanLechner M (2001) Identification and estimation of causal effect of multiple treatments under con-

ditional independence assumption In M Lechner F Pfeiller (eds) Econometric Evaluations ofLabour Market Policies Hildelberg Physical 43ndash58

Lechner M (2011) The relation of different concepts of causality used in time series and microecono-metrics Econometric Reviews 30 109ndash127

Lin C-C amp S Ng (2012) Estimation of panel data models with parameter heterogeneity when groupmembership is unknown Journal of Econometric Methods 1 42ndash55

Lindsay B (1995) Mixture Models Theory Geometry and Applications Hayward Institute for Math-ematical Statistics

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

288 XUN LU ET AL

Linton O amp P Gozalo (2014) Testing conditional independence restrictions Econometric Reviews33 523ndash552

Lu X amp L Su (2014) Determining the number of groups in latent panel structures Working paperSchool of Economics Singapore Management University

Matzkin R (2003) Nonparametric estimation of nonadditive random functions Econometrica 711339ndash1375

Matzkin R (2007) Nonparametric identification In JJ Heckman and EE Leamer (eds) The Hand-book of Econometrics 6B North-Holland

Mclachlan G amp K Basford (1988) Mixture Models Inference and Applications to Clustering MarcelDekker

Nair-Reichert U amp D Weinhold (2001) Causality tests for cross-country panels a new look at FDIand economic growth in developing countries Oxford Bulletin of Economics and Statistics 63153-171

Pearl J (2009) Causality Models Reasoning and Inference (2nd edition) Cambridge UniversityPress

Pearl J (2015) Trygve Haavelmo and the emergence of causal calculus Econometric Theory 31152ndash179

Rosenbaum P amp D Rubin (1983) The central role of the propensity score in observational studies forcausal effects Biometrika 70 41ndash55

Rubin D (1974) Estimating causal effects of treatments in randomized and non-randomized studiesJournal of Educational Psychology 66 688ndash701

Rubin D (2004) Direct and indirect causal effects via potential outcomes Scandinavian Journal ofStatistics 31 161ndash170

Sarafidis V amp N Weber (2015) A partially heterogeneous framework for analyzing panel data OxfordBulletin of Economics and Statistics 77 274ndash296

Sims C (1972) Money income and causality American Economic Review 62 540ndash552Song K (2009) Testing conditional independence via Rosenblatt transforms Annals of Statistics 37

4011ndash4015Strotz R amp H Wold (1960) Recursive vs nonrecursive systems An attempt at synthesis Economet-

rica 28 417ndash427Su L Z Shi amp PCB Phillips (2014) Identifying latent structures in panel data Working paper

Department of Economics Yale UniversitySu L amp H White (2007) A Consistent characteristic function-based test for conditional indepen-

dence Journal of Econometrics 141 807ndash834Su L amp H White (2008) A nonparametric Hellinger metric test for conditional independence Econo-

metric Theory 24 829ndash864Su L amp H White (2014) Testing conditional independence via empirical likelihood Journal of

Econometrics 182 27ndash44Sun Y (2005) Estimation and inference in panel structure models Working paper Department of

Economics UCSDWhite H amp K Chalak (2009) Settable systems An extension of Pearlrsquos causal model with optimiza-

tion equilibrium and learning Journal of Machine Learning Research 10 1759ndash1799White H amp K Chalak (2010) Testing a conditional form of exogeneity Economics Letters 109

88ndash90White H amp K Chalak (2013) Identification and identification failure for treatment effects using

structural systems Econometric Reviews 32 273ndash317White H K Chalak amp X Lu (2011) Linking Granger causality and the Pearl causal model with

settable systems Journal of Machine Learning Research Workshop and Conference Proceedings12 1ndash29

White H amp P Kennedy (2009) Retrospective estimation of causal effects through time In J Castleamp N Shephard (eds) The Methodology and Practice of Econometrics A Festschrift in Honour ofDavid F Hendry pp 59ndash87 Oxford University Press

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 289

White H amp X Lu (2010) Granger causality and dynamic structural systems Journal of FinancialEconometrics 8 193ndash243

White H amp X Lu (2011) Causal diagrams for treatment effect estimation with application to selectionof efficient covariates Review of Economics and Statistics 93 1453ndash1459

Wooldridge J (2005) Simple solutions to the initial conditions problem for dynamic nonlinear paneldata models with unobserved heterogeneity Journal of Applied Econometrics 20 39ndash54

Zellner A (1979) Causality and econometrics In K Brunner and AH Meltzer (eds) Three Aspectsof Policy and Policymaking Carnegie-Rochester Conference Series Vol 10 North-Holland 9ndash54

APPENDIX

Proof of Theorem 31 To show D perp U | X we establish that f (u | d x) = f (u | x)for all (ud x) We have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

= f (u x)

f (x)= f (u | x)

where the first equality in the second line holds since A2(i) (ie Dg | Xg sim D | X)implies fg(d x) = f (d | x) fg(x) say and the second holds by A2(i i) fg(u | d x) =fg(u | x) n

Proof of Theorem 32 Note that

f (u | x) = f (u x)

f (x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

We also have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | d x) fg(x)sum

gisinG pg fg(x)

The first equality in the second line follows from A2(i) Then f (u | d x) = f (u | x) forall (ud x) if and only ifsumgisinG

pg fg(u | d x) fg(x) =sumgisinG

pg fg(u | x) fg(x) for all (ud x)

By assumption (Ug | Dg Xg) g isinG is regular It follows immediately that for all g isinGDg perp Ug | Xg n

Proof of Proposition 33 Take any g isin G and let j belong to group Jg Dg rArrSYg means that Yj = r(Zj Uj bj ) Thus Uj perp Dj | X j implies that Yj perp Dj | X j byDawid (1979 Lemmas 41 and 42) This holds for all g isin G Applying Theorem 31gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

290 XUN LU ET AL

Proof of Theorem 34 We prove this by showing that Y perp D | X implies that Dg rArrSYg for all g isinG Given the assumed regularity Theorem 32 ensures that Y perp D | X impliesthat Yg perp Dg | Xg for all g isin G

Given (i) eq(33) and A2(ii) imply that for all j in group JgE

(Yj |X j = x Dj = d

) = r1(d z bj

)+ E[r2

(zUj bj

) | Dj = d X j = x]

= r1(d z bj

)+ E[r2

(zUj bj

) | X j = x]

Clearly r1(d z bj

)is constant in d if and only if E

(Yj |Dj = d X j = x

)is constant in

d Yj perp Dj | X j implies that E(Yj |Dj = d X j = x

)is constant in d which then implies

that r1(d z bj

)is constant in d Thus Dj rArrS Yj so Dg rArrS Yg

Given (i i) let ky = 1 without loss of generality Then by A2(ii) for all j in group JgFUj (u | x) = E

[1Uj le u

∣∣ X j = x]

= E[

1r(d zUj bj

) le r(d zu bj

)∣∣ X j = x]

= E[

1r(Dj Zj Uj bj

) le r(d zu bj

)∣∣ Dj = d X j = x]

= E[

1Yj le r

(d zu bj

)∣∣ Dj = d X j = x]

= Fj(r(d zu bj

) | d x)

By strict monotonicity of Fj (y | d x) in y

r(d zu bj

) = Fminus1j

(FUj (u | x) | d x

)

Now Yj perp Dj | X j implies that Fminus1j

(FUj (u | x) | d x

)is constant in d so r

(d zu bj

)is

also a constant function in d ie Dj rArrS Yj so Dg rArrS Yg As either (i) or (i i) holds for each g isin G we have Dg rArrS Yg for all g isin G n

Proof of Theorem 35 To show that A1 - A2 and the assumed regularity for Ygtogether with Dg rArrS(Xg) Yg for some g isin G imply D perp Y | X we let j in group Jgg isin G be such that Dj rArrS(X j ) Yj A2(i i) ensures that Dj perp Uj | X j so

P[Yj le y | Dj = d X j = x] =int

1r(d zu bj ) le y

d Fj (u | d x)

=int

1r(d zu bj ) le y

d Fj (u | x)

By assumption Dj rArrS(X j ) Yj so there exists ylowast isin supp(Yj ) such that there is no mea-

surable mapping f jylowast for whichint

1r(d zu bj ) le ylowastd Fj (u | d x) = f jylowast(x) ae-x Specifically this rules out the possibility thatint

1r(d zu bj ) le ylowast

d Fj (u | X j ) = f jylowast(X j ) equiv P[Yj le ylowast | X j ] as

Thus there exists ylowast isin supp(Yj ) such that

P[Yj le ylowast | Dj = d X j = x

] = P[Yj le ylowast | X j = x

]

for x in a set of positive probability so Yj perp Dj | X j Given the regularity assumed for YgTheorem 32 (for Yg) gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 291

Proof of Theorem 41 The proof is analogous to that of Theorem 31 n

Proof of Proposition 42 The proof is analogous to that of Proposition 33 n

Proof of Proposition 43 Under B3 Dgt rArrS Ygt and Dgtminus1 rArrS Ygtminus1 forallg isinGt cupGtminus1 imply that for j isin Jg there exist two measurable functions r1 and r2 such that

Yjt = r1(Uj0 bj0)+ r2(Y jtminus1 Zjt Ujt bjt

) j = 1 N t = 1

Hence

Yjt minusYjtminus1 = r2(Y jtminus1 Zjt Ujt bjt

)minus r2(Y jtminus2 Zjtminus1Ujtminus1 bjtminus1

)

At the sample level this means that

Yt minusYtminus1 = r2(Y tminus1 Zt Ut Bt

)minus r2(Y tminus2 Ztminus1Utminus1 Btminus1

)

Then B4 implies that

Dt perp (Xt Y tminus1Ut Utminus1 Bt Btminus1) | Xt Y tminus1

which further implies that Dt perp (Yt minusYtminus1) | Xt Y tminus1by Dawid (1979 Lemmas 41 and42) as (Yt minus Ytminus1) is a function of (Xt Y tminus1Ut Utminus1 Bt Btminus1) Then we have Dt perpYt | Xt Y tminus1 as Ytminus1 is a component of Y tminus1 n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

Page 2: 33 GRANGER CAUSALITY AND STRUCTURAL CAUSALITY IN … Lu_Su... · 2017. 2. 7. · 264 XUNLUETAL. natural experiments. The goal of this paper is to establish the analogous equiv-alence

264 XUN LU ET AL

natural experiments The goal of this paper is to establish the analogous equiv-alence between Gminuscausality and structural causality in cross-section and paneldata under certain conditional exogeneity assumptions

As Gminuscausality is mostly examined in the time series context it might bethought that it is strictly a time-series concept if so it would make no senseto talk about Gminuscausality in cross-sections In fact however Gminuscausality isfundamentally a conditional independence notion as pointed out by Florensand Mouchart (1982) and Florens and Fougere (1996) Holland (1986) statesthat ldquoin my opinion Grangerrsquos essential ideas involving causation do not re-quire the time-series setting he adoptedrdquo As we show Gminuscausality has di-rectly relevant and useful causal content not only for time-series cross-sectionpanels but also for pure cross-sections under certain conditional exogeneityassumptions

In this paper we focus on the aspects of the relation between Gminuscausality andstructural causality specific to cross-section or panel data An important data fea-ture here is unobserved heterogeneity We pay special attention to two sources ofheterogeneity that impact testing for structural causality namely structural het-erogeneity and distributional heterogeneity The structural heterogeneity refersto cross-group variation in unobservable constants (eg unknown nonrandomparameters) that enter the structural equation Although unobserved heterogene-ity has a long tradition in the literature on panel studies (see Hsiao 2014 andreferences therein) this type of heterogeneity with a group structure has recentlyreceived considerable attention (see eg Sun 2005 Lin and Ng 2012 Deb andTrivedi 2013 Lu and Su 2014 Su Shi and Phillips 2014 Bonhomme andManresa 2015 Sarafidis and Weber 2015 and Bester and Hansen 2016) Thegroup structure has sound theoretical foundations from game theory or macroe-conomic models where multiplicity of Nash equilibria is expected (see eg Hahnand Moon 2010) The distributional heterogeneity refers to the cross-group vari-ation in certain conditional distributions but seems to have received relativelyless attention in the literature1 Browning and Carro (2007) provide some exam-ples of such heterogeneity in micro-data A similar question concerning distri-butional heterogeneity is also discussed in Hausman and Woutersen (2014) andBurda Harding and Hausman (2015) for duration models The presence of eithersource of heterogeneity plays a central role in linking and testing Gminuscausalityand structural causality

The main contributions of this paper can be clearly articulated First we intro-duce a heterogeneous population data generating process (DGP) for both cross-section and dynamic panel data and extend the concept of Gminuscausality from thetime series analysis to such settings In particular we focus on various versions ofGminuscausality ldquoin distributionrdquo which are suitable for studying nonseparable andnonparametric structural equations In the cross-section data time is not explicitlyinvolved thus Gminusnon-causality is a simple conditional independence relation Inpanel data time plays an explicit role and we pay special attention to the role oftemporal precedence in defining Gminuscausality

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 265

Second as in the time-series context we give a natural definition of struc-tural causality in cross-section and panel data We distinguish the structuralcausality from various average causal effects We show that given the con-ditional form of exogeneity structural noncausality implies GminusnoncausalityIf we further assume monotonicity or separability in the structural equationsstructural causality implies Gminuscausality In the case where we do not assumemonotonicity or separability we strengthen structural causality to structuralcausality with positive probability (wpp) and show that structural causalitywpp implies Gminuscausality These results justify using tests of Gminusnon-causalityto test for structural noncausality We emphasize that appropriately choosingcovariates that ensure the conditional exogeneity assumption is the key toendowing Gminusnon-causality with a structural interpretation For example weshow that both leads and lags can be appropriate covariates in the panel datasetting

Third we establish the linkage between population-group conditional exogene-ity and its sample analogue in a heterogeneous population where the latter formsthe basis for linking Gminuscausality and structural causality We show that with-out the conditional exogeneity at the sample level the derivative of conditionalexpectation can be decomposed into three parts a weighted average marginaleffect a bias term due to endogeneity and a bias term due to heterogeneity Thusas emphasized in the literature (see eg Angrist and Kuersteiner 2004 2011)without such assumptions as conditional exogeneity it is impossible to give theGminusnon-causality test a causal interpretation We show that conditional exogene-ity ensures that the two bias terms vanish in which case certain average subgroupcausal effects with mixing weights are identified

The plan of the paper is as follows In Section 2 we provide a review onvarious concepts of causality In Section 3 we first specify the cross-sectionheterogeneous population DGP and the sampling scheme We define the cross-section structural causality and static Gminuscausality and establish the equivalenceof Gminuscausality and structural causality under certain conditional exogeneityconditions Testing for Gminuscausality and structural causality is also discussedIn Section 4 we consider structural causality and Gminuscausality in panel data Wefocus on dynamic panel structures and derive some testable hypotheses Section5 concludes All the mathematical proofs are gathered in the Appendix

2 LITERATURE REVIEW

This paper builds on the vast literature on causality For reviews on causality ineconometrics see Zellner (1979) Heckman (2000 2008) Imbens (2004) Hoover(2008) Kuersteiner (2008) Angrist and Pischke (2009) Imbens and Wooldridge(2009) among others Broadly speaking we can divide the literature into twocategories The first includes Gminuscausality and Sims causality The second per-tains to the causality defined on structural equations and that defined on potentialoutcomes2

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

266 XUN LU ET AL

21 Gminuscausality and Sims causalityGminuscausality and Sims causality were originally proposed to study time seriesdata Granger (1969) defines Gminuscausality ldquoin meanrdquo based on conditional expec-tations while Granger (1980) and Granger and Newbold (1986) generalize it toGminuscausality ldquoin distributionrdquo based on conditional distributions In this paper wefocus on Gminuscausality ldquoin distributionrdquo which we simply refer to as GminuscausalityTo define it we first introduce some notation For any sequence of random vectorsYt t = 01 we let Y t equiv (Y0 Yt ) denote its ldquotminushistoryrdquo and let σ(Y t )denote the sigma-field generated by Y t Let Dt Yt Xt be a sequence of ran-dom vectors where Dt is the cause of interest Yt is the response of interestand Xt is some variates Granger and Newbold (1986) say that Dtminus1 does notGminuscause Yt+k with respect to σ(Y tminus1 Xtminus1) if for all t = 12

Ft+k

(middot ∣∣ Dtminus1Y tminus1 Xtminus1

)= Ft+k

(middot ∣∣ Y tminus1 Xtminus1

) k = 01 (21)

where Ft+k( middot | Dtminus1Y tminus1 Xtminus1

)denotes the conditional distribution function

of Yt+k given(Dtminus1 Y tminus1 Xtminus1

) and Ft+k

( middot | Y tminus1 Xtminus1)

denotes that of Yt+k

given(Y tminus1 Xtminus1

) As Florens and Mouchart (1982) and Florens and Fougere

(1996) have noted eq(21) is equivalent to the conditional independence relation

Yt+k perp Dtminus1 |Y tminus1 Xtminus1 (22)

where we use X perp Y | Z to denote that X and Y are independent given Z If

E(

Yt+k∣∣Dtminus1Y tminus1 Xtminus1

)= E

(Yt+k

∣∣Y tminus1 Xtminus1)

as (23)

we say that Dtminus1 does not Gminuscause Yt+k with respect to σ(Y tminus1 Xtminus1

)ldquoin meanrdquo Eq(23) is a conditional mean independence statement In the litera-ture most of the discussion on Gminuscausality focuses on k = 0 (see eg Granger1980 1988 and Kuersteiner 2008) The definition of Gminuscausality does notrely on any economic theory or structural assumptions Granger (1969 p 430)emphasizes that ldquoThe definition of causality used above is based entirely on thepredictability of some seriesrdquo Therefore Gminuscausality does not necessarily revealany underlying causal relation and it is entirely unfounded to draw any structuralor policy conclusions from the Gminuscausality tests

ldquoSims noncausalityrdquo was originally defined in Sims (1972) Florens (2003) andAngrist and Kuersteiner (2004 2011) give a generalized definition

Y infint perp Dtminus1 |Dtminus2Y tminus1 Xtminus1 t = 1

where Y infint = (Yt Yt+1 ) Chamberlain (1982) and Florens and Mouchart (1982)

show that under some mild regularity conditions Gminusnoncausality with k = 0 andSims noncausality are equivalent when the covariates Xt are absent When thecovariates Xt are present however Gminusnon-causality and Sims noncausality

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 267

are in general not equivalent For an excellent review on the relationship betweenGminusnoncausality and Sims noncausality see Kuersteiner (2008) Similar toGminusnoncausality Sims noncausality is completely based on predictability and hasno structural interpretation

22 Structural causality and causality in the potential outcomeframework

On the other hand causality defined on structural equations or potential outcomesis mostly discussed in cross-section data The structural equation approach canbe traced back to the work of the Cowles Commission in the 1940s (see egHaavelmo 1943 1944 and Koopmans 1950) though most of their work wasbased on linear equations More recently researchers have generalized linearequations to nonseparable and nonparametric equations (see eg Chesher 20032005 Matzkin 2003 2007 and Altonji and Matzkin 2005) In 2015 Economet-ric Theory published a special issue in memory of Trygve Haavelmo (Volume 31Issue 01 and 02) which contains many interesting discussions on recent develop-ments of causality related to Haavelmorsquos structural models

Consider a simple case where the response of interest is Yi and the cause ofinterest is Di The causal relation between Di and Yi can be characterized by anunknown structural equation r such that

Yi = r(Di Ui )

where Ui is other unobservable causes of Yi For example when Yi is the demandDi is the price and Ui represents certain demand shocks r is the demand functionthat is derived from utility maximization r is a general function and does notneed to be linear parametric or separable between observable causes Di andunobservable causes Ui Here r has a structural or causal meaning and we definethe causal effect of Di on Yi based on r Let D and U be the support of Di andUi respectively If r (du) is a constant function of d for all d isin D and all u isin Uthen we simply say that Di does not structurally cause Yi (see eg Heckman2008 and White and Chalak 2009) For a binary Di the effect of Di on Yi isr(1u) minus r(0u) when Ui = u The effect can depend on unobservable u thusunobservable heterogeneity is allowed For a continuous Di the marginal effectof Di on Yi is partr(du)partd when Di = d and Ui = u To identify the effects of Di

on Yi we often impose the assumption of conditional exogeneity

Di perp Ui | Xi

where Xi is some observable covariates This includes the special case where Di

and Ui are independent corresponding to Di being randomizedHalbert White has made substantial contribution on defining identifying and

estimating causal effects in structural equations White and Chalak (2009) ex-tend Judea Pearlrsquos causal model to a settable system which incorporates features

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

268 XUN LU ET AL

of central interest to economists and econometricians optimization equilibriumand learning Roughly speaking a settable system is ldquoa mathematical frameworkdescribing an environment in which multiple agents interact under uncertaintyrdquo(White and Chalak 2009 p 1760) In the settable system a variable of interesthas two roles ldquoresponsesrdquo and ldquosettingsrdquo When the value of the variable is deter-mined by the structural equation these values are called ldquoresponsesrdquo In contrastwhen the value is not determined by the structural equation but is instead set toone of its admissible values these values are called ldquosettingsrdquo They show thaton the settable system causes and effects can be rigorously defined Chalak andWhite (2012) provide definitions of direct indirect and total causality on the set-table system in terms of functional dependence and show how causal relationsand conditional independence are connected Chalak and White (2011) providean exhaustive characterization of potentially identifying conditional exogeneityrelationships in liner structural equations and introduce conditioning and condi-tional extended instrumental variables to identify causal effects White and Chalak(2013) provide a detailed discussion on the identification and identification failureof causal effects in structural equations White and Chalak (2010) discuss how totest the conditional exogeneity assumption

The treatment effect literature adopts the potential outcome framework (seeeg Rubin 1974 Rosenbaum and Rubin 1983 and Holland 1986) For a causeof interest Di and response of interest Yi we define a collection of potential out-comes Yi (d) d isin D where D is the support of Di Then we can define causalor treatment effect based on the potential outcomes Specifically for two valuesd and dlowast in D we define Yi (dlowast)minus Yi (d) as the causal effects of Di on Yi whenDi changes from d to dlowast If Yi (d) = Yi (dlowast) for all d and dlowast isin D we say thatDi has no causal effects on Yi For example when Di is binary ie D = 01Yi (0) and Yi (1) are the potential outcomes corresponding Di = 0 and Di = 1respectively The treatment or causal effect is Yi (1)minus Yi (0) Two average effectsoften discussed in the literature are

the average treatment effect (ATE) E(Yi (1)minusYi (0))

and

the average treatment effect on the treated (ATT) E(Yi (1)minusYi (0) | Di = 1)

Note that we only observe one outcome in data For example when Di is binarythe observed outcome is simply Yi = Di Yi (1) + (1 minus Di )Yi (0) To identify theeffects we often impose the assumption of unconfoundedness or selection onobservables

Yi (d) perp Di | Xi

where Xi is observable covariates Lechner (2001) Imbens (2004) Angrist andPischke (2009) and Imbens and Wooldridge (2009) provide reviews on identi-fication and estimation of various effects in the potential outcome framework

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 269

White and Chalak (2013 p 280) show that the structural equation framework isequivalent to the potential outcome framework For example the potential out-come Yi (d) is simply r(dUi ) The unconfoundedness assumption Yi (d) perp Di |Xi in the potential outcome framework is equivalent to the conditional exogeneityassumption Ui perp Di | Xi in the structural equation framework

23 Relationship between these two types of causality

We emphasize that Gminuscausality and Sims causality are entirely based on pre-dictability while the causality defined on structural equations or potential out-comes is a real causal relation In the literature there are several papers thatlink these two types of causality White and Lu (2011) provide a direct linkbetween Gminuscausality and structural causality in structural equations They showthat given the conditional exogeneity assumption structural noncausality isessentially equivalent to Gminusnoncausality Angrist and Kuersteiner (2004 2011)and Kuersteiner (2008) discuss the link between Sims noncausality and thenoncausality defined in the potential outcome framework under the assumption ofselection on observables Angrist Jorda and Kuersteiner (2013) study the mon-etary policy effects in the potential outcome framework Lechner (2011) linksGrangerSims noncausality to various average effects defined in the potentialoutcome framework However all the discussions so far have been made in thecontext of time series data

3 STRUCTURAL CAUSALITY AND GminusCAUSALITY INCROSS-SECTION DATA

31 Structural causality

We first specify a population DGP for a single time period omitting the timeindex We write N+ =12 and N = 0cupN+ We also write N+ = N+ cupinfin and N = 0cup N+

Assumption A1 (Cross-Section Heterogeneous Population DGP) Let(F P) be a complete probability space let N isin N+ and for the membersof a population j isin 1 N let the random vector Yj be structurally determinedby a triangular system as

Yj = r(Dj Zj Uj bj

) (31)

where r is an unknown measurable ky times 1 function ky isin N+ Dj Zj and Uj j = 12 are vectors of nondegenerate random variables on (F P) havingdimensions kd isin N+ kz isin N and ku isin N respectively and bj is a nonrandomreal vector of dimension kb isin N Suppose also that Wj is a random vector on(F P) with dimension kw isin N The triangular structure is such that Yj doesnot structurally determine Dj Zj and Uj and neither Yj nor Dj structurallydetermines Wj

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

270 XUN LU ET AL

To keep causal concepts clear it is important to distinguish between the popu-lation DGP defined above and the sample DGP defined later For now we leavesampling aside We may also refer to members of the population as ldquounitsrdquo orldquoindividualsrdquo and the distribution of the population as a mixture distributionMixture distributions appear in many contexts in the literature and arise natu-rally where a statistical population contains two or more subpopulations (see egLindsay 1995 and Mclachlan and Basford 1988) As will be clear in a momentthe sources of population heterogeneity are in general not observed

We interpret Yj as the response of interest It is determined by variables Dj Zj and Uj and the constants bj The variables can be binary categorical orcontinuous Immediately below we formalize a natural notion of causality for thissystem For now it is heuristically appropriate to view Dj as observable causes ofinterest (eg a treatment) Zj as other observable causes not of primary interestand Uj as unobservable causes3

The structural function r is unknown and our goal will be to learn about itfrom a sample of the population Nevertheless when Wj has positive dimensionr embodies the a priori exclusion restriction that Wj does not determine Yj Thetypical sources of this restriction as well as the identities of Dj Zj and Uj theirstatus as observable or not and the priority or precedence relations embodied inthe assumed triangularity are economic theory and specific domain knowledge

The assumed triangularity rules out explicit simultaneity for succinctness andclarity For the causality in simultaneous structural equations there are differentviews in the literature White and Chalak (2009 2013) follow Strotz and Wold(1960) and argue that simultaneous equations are not causal structural relationsbut are instead ldquomutual consistency conditions holding between distinct sets ofstructural equations ndash for example between one set of structural equations govern-ing partial equilibrium and another governing full equilibriumrdquo However otherresearchers argue that simultaneous structural equations can be given a causalinterpretation For example Angrist Graddy and Imbens (2000) provide a poten-tial outcome interpretation for general nonlinear simultaneous equation modelsThey also show that the standard linear instrumental variable estimator identifiesa weighted average of the causal effects

By definition the constants bj are fixed for a given individual Neverthelessthey may vary across individuals we call this variation structural heterogeneityIf the bj rsquos are identical we write them as b0 We assume that bj rsquos are unknownNote that the presence of bj facilitates writing r without a j subscript as differ-ences in the structural relations across population members can be accommodatedby variations in the possibly infinite-dimensional bj

To give a definition of causality for the structural system in Assumption A1 letDj = supp(Dj ) denote the support of Dj ie the smallest closed set containingDj with probability 1

DEFINITION 31 (Cross-Section Structural Causality) Let j be given If thefunction r(middot zu bj ) Dj rarr R

ky is constant on Dj for all admissible z and u

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 271

then Dj does not structurally cause Yj and we write Dj rArrS Yj Otherwise Dj

structurally causes Yj and we write Dj rArrS Yj

Here we implicitly assume that there is variation in potential cause Dj (ie Dj

is nondegenerated) and under the set of counterfactual policies Dj the structuralfunction r is invariant Therefore the invariant function r fully characterizes thecausalstructural relationship between Yj and (Dj Zj Uj )

Structural causality is structural functional dependence in a specific contextSimilar definitions can be given for Zj and Uj but we leave these implicitas these are not the main causes of interest Causality for components of Dj

is defined in an obvious way If we define Yj (d) = r(d Zj Uj bj

)as the

potential outcomes (see eg Rubin 1974 Rosenbaum and Rubin 1983 andHolland 1986) then the equivalent definition of structural noncausality is thatYj (d) = Yj (dlowast) as for all (ddlowast) isin Dj

Structural causality can be easily understood in the familiar linear structure forscalar Yj

Yj = bj0 + Dprimej bj1 + Z prime

j bj2 +Uj

where bj = (bj0bprimej1bprime

j2)prime If bj1 = 0 then Dj does not structurally cause Yj

Otherwise it does This is so natural and intuitive that one might wonder whycausality is such a thorny topic One main reason for confusion arises from therelation between causality and simultaneity as discussed above the triangularsystems considered here obviate this issue Another main reason for confusionis the failure to distinguish carefully between structural equations of the sort writ-ten above and regression equations which may look similar but need not havestructural content The equation above is entirely structural We will encounterregressions (conditional expectations) only after suitable structural foundationsare in place

Observe that heterogeneity in bj rsquos permits Dj to cause Yj for some j and notfor others

The nondegeneracy of Dj ensures that Dj contains at least two points so Dj

is a variable rather than a constant Variation in potential causes is fundamentalfor defining causality (cf Holland 1986) this is what makes possible analogousdefinitions of causality for Zj and Uj Significantly however because bj rsquos arefixed constants they cannot be causes of Yj Instead bj rsquos can be effects or candetermine effects This follows from the following formal definition

DEFINITION 32 (Intervention and Effect) Let j be given and let d and dlowastbe distinct admissible values for Dj Then d rarr dlowast is an intervention to Dj Let(zu) be admissible values for (Zj Uj ) The difference ylowast

j minus yj = r(dlowast zu bj

)minusr(d zu bj

)is the effect on Yj of the intervention d rarr dlowast to Dj at (zu)

This is also referred to as the ldquocausal effectrdquo in the treatment effect literature(see eg Rubin 1974) For the linear case with scalar d the effect of a one-unit

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

272 XUN LU ET AL

intervention d rarr d +1 is

ylowastj minus yj = (

bj0 + (d +1)bj1 + zprimebj2 +u)minus (

bj0 +d bj1 + zprimebj2 +u) = bj1

For unit j this effect is fixed ie it is the same constant for all (zu) Becausebj can differ across units this permits effect heterogeneity

Alternatively if Yj = bj0 + Zj Dprimej bj1 +Uj with scalar Zj then the effect of

d rarr dlowast is

ylowastj minus yj = (

bj0 + zdlowastprimebj1 +u)minus (

bj0 + zd primebj1 +u) = z

(dlowast minusd

)primebj1

Here the effect of Dj on Yj depends on z but not u In this case bj1 determinesthe effect of Dj on Yj together with Zj When an effect depends on a variable(ie an element of Zj or Uj ) it is standard (especially in the epidemiologicalliterature) to call that variable an effect modifier

For simplicity we call ylowastj minus yj the effect of d rarr dlowast Chalak and White (2012)

discuss indirect direct and total effects in structural equations In the potentialoutcome framework Rubin (2004) discusses ldquodirectrdquo and ldquoindirectrdquo casual effectusing principal stratification

We distinguish the structural causal effect from various average causal effectssuch as E[r(dlowast Zj Uj bj ) minus r(d Zj Uj bj )] It is clear that structural non-causality implies zero average causal effects while the converse is not necessarilytrue

To describe identified effects in samples from heterogeneous populationswe define population groups Jg g = 1 as collections of populationunits having identical bj and identical distributions of (Dj Uj ) | X j whereX j = (Z prime

j W primej )

prime We define G = 1 As the population is finite so is thenumber of groups le N We define Ng = j isinJg where middot is the cardinal-ity of the indicated set and let pg = NgN be the proportion of population unitsbelonging to group Jg The bj rsquos by themselves need not define groups as thedistributions of (Dj Uj ) | X j may differ for units with identical bj We call cross-group variation in the distributions of (Dj Uj ) | X j distributional heterogeneityto distinguish this from structural heterogeneity that refers to cross-group varia-tion in bj

Note that because groups are defined by unobservable constants bj and dis-tributions involving the unobservable Uj in general we do not know for surewhether two units belong to the same group Interestingly in the related litera-ture Bester and Hansen (2016) consider estimating grouped effects in panel datamodels when each individualrsquos group identity is known and Su Shi and Phillips(2014) consider identifying latent group structures in panel data models via a vari-ant of Lasso Nevertheless both groups of researchers have only focused on thestructural heterogeneity in the panel framework

Given A1 it follows that (Yj Dj ) | X j is identically distributed for all unitsj in group Jg As a convenient shorthand for j isin Jg we write (Yj Dj ) | X j sim(Yg Dg) | Xg where A sim B denotes that A is distributed as B

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 273

Typically we do not observe an entire population Instead we observe observa-tions sampled from the population in some way For simplicity and concretenesswe consider simple random sampling At the sample level we write4

Y = r (D Z U B)

where B is the randomly sampled bj Note that the randomness of B arises solelyfrom the sampling process We write X = (Z primeW prime)prime and distinguish X from thecauses D by calling X a vector of covariates

For any random vector X we let f (x) and F (x) denote the joint prob-ability density function (PDF) and cumulative distribution function (CDF)respectively For any two random vectors X and Y we use f (y|x) and F (y|x)to denote the conditional PDF and CDF of Y given X = x respectively Wealso use subscript j and g to denote individual j and group g respectivelyFor example Fg(ud x) and fg (u|d x) denotes the CDF of (Ug Dg Xg)and the conditional PDF of Ug given Dg Xg for group g respectively Notethat Fg(ud x) = Nminus1

g

sumjisinJg Fj (ud x) which defines the mixture CDF of

(Ug Dg Xg)

32 Gminuscausality a first encounterWe now define Gminuscausality for cross-section data based on Grangerrsquos philoso-phy of nonpredictability Holland (1986 p 957) considers a special case whereD is randomized and applies Gminuscausality to cross-section data5 As in Hollandwe define Gminusnon-causality as a conditional independent statement FollowingDawid (1979) we write X perp Y | Z to denote that X and Y are independentgiven Z and X perp Y | Z if X and Y are not independent given Z Translat-ing Granger and Newboldrsquos (1986 p 221) definition to the cross-section contextgives

DEFINITION 33 (GminusCausality) Let Y Q and S be random vectors IfQ perp Y | S then Q does not Gminuscause Y with respect to (wrt) S OtherwiseQ Gminuscauses Y wrt S

Here Y Q and S can be any random vectors and this notion has no structuralcontent When S is a constant we have the simplest form of Gminusnon-causalityindependence Correlation is an example of Gminuscausality

To give Gminuscausality structural meaning we impose A1 assume that D obeysa suitable exogeneity condition and take Q = D and S = X The exogeneityassumed for D ensures identification for various measures of its effects and canoften be structurally justified It also turns out to be ideally suited to endowingGminuscausality with structural meaning

To see how this works in a simple setting consider the homogeneous case withD being a binary scalar in which average treatment effects (AT E) and averagetreatment effects on the treated (AT T ) are often discussed (see eg Rubin 1974

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

274 XUN LU ET AL

Hahn 1998 Hirano Imbens and Ridder 2003 and Angrist and Pischke 2009)Define the covariate-conditioned effect of treatment on the treated as

AT TX = E (Y (1)minusY (0) | D = 1 X)

where Y (1) is the potential response to treatment and Y (0) is the potentialresponse in the absence of treatment Using A1 we write Y (1) = r(1 Z U b0)and Y (0) = r(0 Z U b0) AT TX thus has a clear structural interpretation Fromthis we can construct AT T

AT T = E (Y (1)minusY (0) | D = 1) = E(AT TX | D = 1)

Note that here we define AT T and AT TX based on our structural equation r Nevertheless they can be defined without a structural model ie based on thetwo potential outcomes Y (0) and Y (1) (see eg Rubin 1974 and Angrist andPischke 2009)

To identify AT TX it suffices that D perp Y (0) | X Given A1 it suffices forthis that D perp U | X as is readily verified This conditional exogeneity is acommon identifying assumption in cross-sections6 Classical strict exogeneity((D Z) perp U ) is sufficient but not necessary for this Using D perp U | X in thesecond line below we have

AT TX = E [r(1 Z U b0) | D = 1 X ]minus E [r(0 Z U b0) | D = 1 X ]

= E [r(1 Z U b0) | D = 1 X ]minus E [r(0 Z U b0) | D = 0 X ]

= E(Y | D = 1 X)minus E(Y | D = 0 X) = μ(1 X)minusμ(0 X)

Thus AT TX can be expressed in terms of the distribution of observables (hereμ(1 X) and μ(0 X)) so this effect is identified (eg Hurwicz 1950) Simi-larly AT T is identified The identification result based on conditional exogeneityhas been extensively discussed in the treatment effect literature (see eg Rubin1974 Holland 1986 and Angrist and Pischke 2009)

To see the structural meaning for Gminuscausality observe that E(Y | D = 1 X)minusE(Y | D = 0 X) = 0 as is another way to write E(Y | D X) = E(Y | X) asThus in this context Gminusnon-causality in mean of D for Y wrt X is equivalentto AT TX = 0 as

Note that even for binary treatments Gminuscausality in mean does not tell thefull story as Gminuscausality in distribution can hold even when AT TX or AT Tvanishes For example suppose

Y = U D

with D perp U E(U ) = 0 E(U 2) lt infin and E(D2) lt infin Then D structurallycauses Y its effect on Y is U a random effect These effects may be posi-tive or negative however they average out as AT T = 07 Here and as we willshow generally Gminusnon-causality essentially has the interpretation that Y is not

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 275

structurally caused by D under the key conditional exogeneity assumption Inthis example we have Gminuscausality (D perp Y ) as E(Y 2 | D) = D2 E(U 2) TestingGminusnon-causality will detect the structural causality here in contrast to testingAT T = 0 or equivalently Gminusnon-causality in mean of D for Y

When D is nonbinary A1 together with suitable exogeneity conditionssimilarly suffices to identify certain effects that give structural meaning toGminuscausality This makes it generally possible to test for structural causality bytesting for Gminuscausality

With heterogeneity matters become somewhat more complicated In particularin the binary case μ(1 X)minusμ(0 X) is no longer AT TX Instead with suitableexogeneity (Assumption A2 below) we have that for each x μ(1 x)minusμ(0 x)recovers a blended effect

μ(1 x)minusμ(0 x) =sumg=1

ζg|x AT Tg|x

where the ζg|x rsquos are nonnegative weights adding to one and for all j isin JgAT Tg|x = E[Yj (1) minus Yj (0) | Dj = 1 X j = x] is the group-specific covariate-conditioned average effect of treatment on the treated Here Yj (1) is the poten-tial response to treatment and Yj (0) is the potential response in the absence oftreatment

33 Exogeneity and effect identification with heterogeneity

With heterogeneity an identifying exogeneity condition relevant for analyzingGminuscausality is

Assumption A2 (Heterogeneous Cross-Section Exogeneity) For all g isin G(i) Dg | Xg sim D | X and (i i) Dg perp Ug | Xg

Observe that this imposes structures for all groups namely all g isin G Althoughthis condition is not the weakest possible (see Chalak and White (2012) and thediscussion below) A2 generally suffices to identify effects and link Gminuscausalityand structural causality A2(i) restricts the allowed distributional heterogeneityfor all g isin G the conditional distributions of Dg given Xg are assumed identicalA2(i i) imposes conditional exogeneity for all groups Below we discuss thisfurther

THEOREM 31 Suppose A1 - A2(i) hold If A2(i i) holds then D perp U | X

It is also of interest to know whether the converse holds A strict conversedoes not hold as certain fortuitous cancellations in population-group condi-tional distributions can yield sample conditional exogeneity without population-group conditional exogeneity Nevertheless the converse does hold under a mildregularity condition ruling out exceptional cases We introduce the followingdefinition

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

276 XUN LU ET AL

DEFINITION 34 Ug | Dg Xg g isin G is regular ifsumgisinG

pg fg(u | d x) fg(x) =sumgisinG

pg fg(u | x) fg(x) for all (ud x) (32)

implies Ug perp Dg | Xg for all g isin GWhen G = 1 Ug | Dg Xg g isin G is necessarily regular To understand whatregularity rules out consider the next simplest case with G = 2 and Xg absentThen Ug | Dg g isin G fails to be regular if and only if U2 perp D2 (say) and

f1(u | d) = f1(u)minus 1minus p1

p1[ f2(u | d)minus f2(u)] for all (ud)

where the two groups have been re-indexed for convenience Suppose that p1f1(u) f2(u) and f2(u | d) are arbitrary Then it is easily arranged that f1(u | d)can be negative for (ud) in a set of positive probability so f1(u | d) does notdefine a conditional density and Ug | Dg g isin G is regular after all If this f1(u |d) is nevertheless a conditional density it is clearly highly special as it ensuresthat eq(32) holds for the given p1 and the functions defined by f1(u) f2(u) andf2(u | d) (all of which are typically unknown) but not necessarily otherwise

In the general case Ug | Dg Xg g isin G fails to be regular if and only ifUg perp Dg | Xg for some g isin G and

f1(u | d x) = f1(u | x)minus (p1 f1(x))minus1sumg=2

pg [ fg(u | d x) minus fg(u | x)] fg(x)

times for all (ud x)

where we again re-index the groups As before this is clearly a very special pop-ulation configuration ruling out such cases by imposing regularity is a weak re-striction

The converse result is

THEOREM 32 Suppose A1 - A2(i) hold and suppose Ug | Dg Xgg isin Gis regular If D perp U | X then A2(i i) holds

Theorems 31-32 also hold with Yg | Dg Xg g isinG regular D perp U | X replacedby D perp Y | X and A2(i i) replaced by Dg perp Yg | Xg g isin G This result plays acentral role in linking structural causality and Gminuscausality

34 Linking structural causality and GminuscausalityAs we have seen A1 imposes structures where causal effects are well definedA2 permits recovering versions of these As we now prove this gives structuralmeaning to Gminuscausality generally Recall that the bj rsquos are identical in group Jgthus the same structural causality relations hold for all j in a given group Wewrite Dg rArrS Yg when Dj rArrS Yj for (all) j in group Jg Our next result showsthat structural noncausality implies Gminusnon-causality

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 277

PROPOSITION 33 Let A1 - A2 hold Then structural noncausality(Dg rArrS Yg forallg isinG) implies that D does not Gminuscause Y wrt X ie D perp Y | X

This result is intuitive when structural effects are well defined and identifiableGminusnon-causality follows from structural noncausality Holland (1986 p 958)gives a similar result for the random experiment case Our Proposition 33 jus-tifies using tests of Gminusnoncausality to test structural noncausality if we rejectGminusnoncausality we must reject structural noncausality

As WL show structural causality does not imply Gminuscausality Without furtherassumptions the concepts are not equivalent Nevertheless if as is commonlyassumed in the literature (see WL for discussion) r obeys separability betweenobservable causes D and unobservable causes U or r obeys a specific form ofmonotonicity in U then with suitable regularity structural causality does implyGminuscausality We have

THEOREM 34 Given A1 - A2 suppose Yg | Dg Xgg isin G is regularSuppose further that for all g isin G either (i) or (i i) holds

(i) For all j isin Jg for unknown measurable functions r1 and r2

Yj = r1(Dj Zj bj

)+ r2(Zj Uj bj

) (33)

(ii) For all j isinJg for = 1 ky r(d zu bj ) = r0(d zu bj ) for scalaru where r0(d z middot bj ) is strictly monotone increasing for each admissi-ble (d z) and Fj(y | d x) = P[Yj le y | Dj = d X j = x] is strictlymonotone in y for each admissible (d x)

Then structural causality (for some g isin G Dj rArrS Yj for (all) j in Jg) impliesthat D Gminuscauses Y wrt X ie D perp Y | X

Kasy (2011) gives a discussion of structures satisfying (i i) Even without sep-arability or monotonicity there is an equivalence between Gminuscausality and astronger notion of structural causality that handles certain exceptional cases wherethe causal structure and the conditional distribution of (Uj Dj ) given X j inter-act in just the right way to hide the structural causality WL call this strongernotion structural causality with positive probability and provide discussions Welet Yj = supp(Yj ) and Xj = supp(X j )

DEFINITION 35 (Structural Causality with Positive Probability) SupposeA1 holds and let j be given Suppose that for each y isin Yj there exists a mea-surable function f jy Xj rarr [01] such thatint

1r(Dj Zj u bj

)lt y

d Fj

(u | X j

) = f jy(X j

)as (34)

where Fj (u | X j ) = E[1Uj le u | X j ] Then Dj does not structurally cause Yj

almost surely (as) wrt X j (Dj rArrS(X j ) Yj ) Otherwise Dj structurally causesYj with positive probability (wpp) wrt X j

(Dj rArrS(X j ) Yj

)

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

278 XUN LU ET AL

By the definition of a group the same structural causality wpp relations holdfor all j in a given group We thus write Dg rArrS(Xg) Yg when Dj rArrS(X j ) Yj holdsfor all j in group Jg

Below is an example in WL in which Dj structurally causes Yj while Dj doesnot structurally cause Yj as Consider the structural equation r

Yj = r(Dj Uj

) = DjradicD2

j +1Uj1 + 1radic

D2j +1

Uj2

where Uj equiv (Uj1Uj2) and Dj are all N (01) random variables and Dj U1 j andU2 j are mutually independent For simplicity there is no bj Zj or X j It is clearthat Dj structurally causes Yj here Nevertheless Dj does not structurally causeYj as To see this note that Yj and Dj are independent Thus the LHS of eq(34)becomesint

1r(Dj u1u2

)lt y

d Fj (u1u2)

=int

1r(Dj u1u2

)lt y

d Fj

(u1u2|Dj

) = Pr[Yj lt y|Dj

] = (y)

where Fj (middot) and Fj(middot|Dj

)denote the CDF of Uj and the conditional CDF of Uj

given Dj and is the standard normal CDF Thus the RHS does not depend onDj ie Dj does not structurally cause Yj as

Theorem 35 below gives a result linking Gminuscausality and structural causalityas

THEOREM 35 Suppose A1 - A2 hold Suppose also that Yg | Dg Xgg isinG is regular If Dj structurally causes Yj wpp wrt X j (for some g isin GDj rArrS(X j ) Yj for all j in Jg) then D Gminuscauses Y wrt X ie D perp Y | X

Together Proposition 33 and Theorem 35 establish that cross-sectionGminuscausality and structural causality are essentially equivalent under the key con-ditional exogeneity assumption We say ldquoessentiallyrdquo as Theorem 35 rules outthe two ways that Gminusnon-causality can mask structural causality The first arisesfrom subtle interactions between the causal structure and the conditional distribu-tion of (Uj Dj ) given X j The second arises under heterogeneity from delicatecancellations among the conditional distributions of Yj given Dj and X j acrossgroups These exceptional possibilities only trivially mitigate the power of testsfor Gminuscausality as tests for structural causality Errors of interpretation and in-ference need not result as long as these exceptions are recognized

35 Identification and identification failure

To gain further insight into the relation between Gminuscausality and effect identi-fication we now undertake a deeper analysis of Gminuscausality in mean For this

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 279

let qg(d x) = fg(d x)sumhisinG (ph fh(d x)) Under A1 the law of iterated ex-

pectations gives

E (Y | D = d X = x) =sumgisinG

pg qg(d x)

intr(d zu bg) d Fg(u | d x)

For concreteness consider identifying average marginal effects for a continu-ous treatment d To allow both discrete and continuous Ug let νg(u | d x) definea σminusfinite measure dominating that defined by Fg(u | d x) and let fg(u | d x)define the associated Radon-Nikodym density for Ug given (Dg Xg) Assum-ing differentiability and mild regularity derivations parallel to those of White andChalak (2013) give

part

partdE(Y | D = d X = x)

=sumgisinG

pg qg(d x)

intpart

partdr(d zu bg) fg(u | d x) dνg(u | d x)

+sumgisinG

pg qg(d x)

int[r(d zu bg)

part

partdln fg(u | d x)]

times fg(u | d x) dνg(u | d x)

+sumgisinG

pgpart

partdqg(d x)

intr(d zu bg) fg(u | d x)dνg(u | d x)

After some manipulation this regression derivative can be written as

part

partdE (Y | D = d X = x) = β(d x)+ δ1(d x)+ δ2(d x) (35)

The first of the three terms on the right is a weighted average marginal effect

β(d x) =sumgisinG

pg qg(d x)βg(d x) where βg(d x)

= E[(partpartd)r(Dg ZgUg bg) | Dg = d Xg = x

]is the covariate-conditioned average marginal effect of Dj on Yj at (Dj = d X j =x) for j isin Jg g isin G Thus when the other two terms vanish the regressionderivative identifies the structurally meaningful weighted effect β(d x) andwhen β(d x) is nonzero for (d x) in a set of positive probability we haveGminuscausality in mean of D for Y wrt X When the other two terms do notvanish we generally have identification failure and there is no necessary linkbetween Gminuscausality and structural causality

The second term is a pure endogeneity bias

δ1(d x) =sumgisinG

pg qg(d x) E(εg Sg | Dg = d Xg = x

)

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

280 XUN LU ET AL

where εg = Ygminus E(Yg | Dg = d Xg = x

)is the regression residual for the given

group and Sg = (partpartd) ln fg(Ug | Dg Xg) is the exogeneity score of White andChalak (2013) Observe that E(εg Sg | Dg = d Xg = x) represents a specificform of omitted variable bias where the exogeneity score is the omitted variableA sufficient (but not necessary) condition for this bias to vanish is A2(i i) Dg perpUg | Xg for all g isin G

The third term is a pure heterogeneity bias

δ2(d x) =sumgisinG

pgpart

partdqg(d x) E

(Yg | Dg = d Xg = x

)

due to heterogeneity of fg(d | x) across members of G8 A sufficient (but notnecessary) condition for this to vanish is A2(i) Dg | Xg sim D | X for all g isin G

Thus A2 ensures that the regression derivative identifies the weighted effect

β(d x) = βlowast(d x) =sumgisinG

pg qlowastg(x)βlowast

g(d x)

where qlowastg(x) = fg(x)

sumh (ph fh(x)) and βlowast

g(d x) = int(partpartd)r(d zu bg)

d Fg(u | x) This identification provides the link between Gminuscausality in meanand structural causality

Although A2 is not necessary for effect identification cases where identifica-tion holds in the absence of A2 are quite special analogous to failures of reg-ularity Thus for practical purposes A2 can be viewed as playing the key rolein identifying effects of interest and thereby linking Gminuscausality and structuralcausality

Further as Proposition 61 and Corollary 62 of WL show in the absence ofstructural causality Gminuscausality is essentially equivalent to exogeneity failureHere this is reflected in the fact that when βg(d x) vanishes for all g A1 andmild regularity conditions give

part

partdE (Y | D = d X = x) = δ1(d x)+ δ2(d x)

Thus with structural noncausality as Gminuscausality in mean implies the fail-ure of A2(i) A2(i i) or both conversely failure of A2 essentially ensuresGminuscausality in mean

36 Testing for Gminuscausality and structural causality in cross-sectiondata

As just seen the hypothesis of Gminusnon-causality and thus of structural noncausal-ity under the conditional exogeneity assumption is a specific conditional inde-pendence In the literature there are many conditional independence tests thatapply to IID data (eg see Delgado and Gonzalez-Manteiga 2001 Fernandes and

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 281

Flores 2001 Su and White 2007 2008 2014 Song 2009 Linton and Gozalo2014 Huang Sun and White 2016) These methods can be computationallychallenging as they are nonparametric Angrist and Kuersteiner (2004 2011)develop a semi-parametric test for conditional independence Based on the factthat Y perp D | X implies ψ1(Y ) perp ψ2(D) | X for any measurable vector functions(ψ1ψ2) WL propose several convenient regression-based tests

4 STRUCTURAL CAUSALITY AND GminusCAUSALITY IN PANEL DATAThe literature contains considerable discussion about Gminuscausality in the paneldata setting Nevertheless the focus is mainly on linear or parametric modelsFor example Chamberlain (1984) discusses Gminuscausality ldquoconditional on un-observables [lsquofixed effectrsquo]rdquo in a linear model and a logit model Holtz-EakinNewey and Rosen (1988) consider a traditional linear vector autoregression(VAR) and use GMM to test Gminuscausality Nair-Reichert and Weinhold (2001)discuss Gminuscausality in a dynamic mixed fixed- and random-coefficients linearmodel allowing heterogeneity across individuals Dumitrescu and Hurlin (2012)test for Gminuscausality in a linear dynamic panel model with fixed coefficients thatvary across individuals There is also a growing literature on nonlinear and non-separable panel models (see eg Hoderlein and White (2012) and the referencestherein) We are not aware that Gminuscausality has been discussed in such modelsso far

We emphasize that in panel data Gminuscausality is also entirely based on pre-dictability and has no causal interpretation In this section we link Gminuscausalityand structural causality in general panel structures Since the static panel case issimilar to the cross-sectional case we focus on a dynamic data generating processTo simplify notation for given y isin N+ we let Y jtminus1 denote the finite historyY jtminus1 = (Yjtminusy Yjtminus1)

Assumption B1 (Panel Population DGP) Let (F P) be a complete prob-ability space let N isin N+ and let y isin N+ For the members of a populationj isin 1 N let the random vectors Yjt t = 12 be structurally determinedby a triangular system as

Yjt = r(Y jtminus1 Djt Zjt Ujt bjt

) (41)

where r is an unknown measurable ky times 1 function ky isin N+ Yjτ (τ = 1 minusy 0) Djt (kd times1kd isin N+) Zjt (kz times1 kz isin N) and Ujt (ku times1 ku isin N)are vectors of nondegenerate random variables on (F P) and bjt is a non-random real vector of dimension kb isin N Suppose also that Wjt t = 12 arerandom vectors on (F P) with dimension kw isin N The triangular structureis such that for t = 12 neither Yjsinfins=1minusy

nor Djsinfins=1 structurally deter-mine Wjt Yjsinfins=1minusy

does not structurally determine Djt Zjt or Ujt andDjsinfins=1 does not structurally determine Zjt or Ujt

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

282 XUN LU ET AL

The interpretation of structural eq(41) is the same as that for eq(31) ex-cept that here we have multiple time periods t = 12 The elements of(Yjt Djt Zjt Wjt Ujt

)and bjt rsquos can contain both time-varying and time-

invariant elements For now we do not distinguish these The elements of(Djt Zjt Wjt Ujt ) can contain lags of underlying variables By permittingonly finite histories Y jtminus1 Djt Zjt we restrict attention to Markov-type datagenerating processes this greatly simplifies the analysis and corresponds to thestructures usually considered in practice

As in the cross-section case we can have both structural heterogeneity anddistributional heterogeneity We specify the relevant structural and distributionalheterogeneities below

We again impose random sampling from the population For simplicity weassume all time periods can be observed for every individual so we observe abalanced panel Similar to the cross-section case we write

Yt = r (Y tminus1 Dt Zt Ut Bt ) t = 1 T

by suppressing the cross-sectional subscript i We assume that we observe data(Yt Dt Zt Wt ) t = 1 T and relevant lags prior to t = 1

41 Linking structural causality and Gminuscausality in dynamic panelsWe first define structural causality for the dynamic panel structure Let Djt be thesupport of Djt

DEFINITION 41 (Structural Causality in Dynamic Panels) Let j and t begiven If the function r(y jtminus1 middot zjt ujt bjt ) Djt rarr R

ky is constant on Djt

for all admissible y jtminus1 zjt and ujt then Djt does not structurally cause Yjt and we write Djt rArrS Yjt Otherwise Djt structurally causes Yjt and we writeDjt rArrS Yjt

We let X jt = (Z primejt W prime

jt )prime denote the covariates at time t WL consider

retrospective conditional exogeneity and retrospective Gminuscausality (White andKennedy 2009) We also allow this For this we letX jt denote a covariate his-tory that may contain X jt and lags or leads of X jt as in Wooldridge (2005 p 41)or WL ie for some m isin N+

X jt = SX(X jtminusm X jt X jt+m

) t = 12

where SX is a given selection matrix We assume

X jt X jtminusm subeX jt and

allow X jt also to contain leads of X jt When Xjt contains leads we have theretrospective case as in WL We denote randomly sampled covariates X t Forsimplicity we also assume that all needed elements of X t are observable Tem-poral precedence plays an important role in panel data In particular comparedwith the cross-sectional case we now have a richer set of covariates (includingboth leads and lags) to choose from

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 283

To provide a link between structural causality and Gminuscausality for dynamicpanels we condition on not only covariates X t but also lags of Yt For thiswe denote Y jtminus1 = (Yjtminusmy Yjtminus1) where my isin N+ and my ge y ie

Y jtminus1 sube Y jtminus1 Similarly we let Y tminus1 be the randomly sampled Y jtminus1Similar to the cross-section case for each t we define population group Jgt

g isin 1 t as a collection of population units having identical bjt and iden-tical distribution of (Ujt Djt ) |X jt Y jtminus1 We also let Gt = 1 t Forgiven t we write Dgt rArrS Ygt to denote Djt rArrS Yjt for all j in group Jgt g isin Gt

Next we impose an exogeneity assumption analogous to A2

Assumption B2 (Heterogeneous Dynamic Panel Exogeneity) Let t be givenFor all g isin Gt (i) Dgt |Xgt Y gtminus1 sim Dt |X t Y tminus1 and (i i) Dgt perp Ugt |Xgt Y gtminus1

Here as before for all g isin Gt the conditional distributions of Dgt given(Xgt Y gtminus1) are assumed to be identical and we write the common distributionas Dt |X t Y tminus1

The analog of Theorem 31 relating population and sample conditional exo-geneity is

THEOREM 41 Suppose B1ndashB2(i) hold and let t be given If B2(i i) holdsthen Dt perp Ut |X t Y tminus1

Proposition 42 provides a link between structural noncausality and Gminusnon-causality

PROPOSITION 42 Suppose that B1 and B2 hold Let t be given Thenstructural noncausality (Dgt rArrS Ygt forallg isin Gt ) implies that Dt does notGminuscause Yt wrtX t Y tminus1 ie Dt perp Yt |X t Y tminus1

We can also define structural causality with positive probability for dynamicstructures and show that structural causality and Gminuscausality are ldquoessentiallyrdquoequivalent under the key conditional exogeneity assumption For brevity we omitthe details

So far we have not carefully distinguished between time-varying and time-invariant elements of unobservable bjt and Ujt Below we briefly discuss thecase where we allow time-invariant components of Ujt and the randomly sampledbjt to be arbitrarily dependent or correlated with the cause of interest Djt For this we denote Ujt equiv (Ujt Uj0) where Ujt is time varying and Uj0time-invariant Similarly we denote bjt equiv (bjt bj0) The randomly sampledUjt Uj0 bjt bj0 are denoted as Ut U0 Bt B0 respectively In princi-ple we want to remove the time-invariant Uj0 and bj0 using eg the firstdifference For this it is convenient to impose a separable assumption on thestructural equation9 We also impose a conditional exogeneity assumption at thesample level

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

284 XUN LU ET AL

Assumption B3 Suppose that B1 holds with

Yjt = r1(Djt Uj0 bj0

)+r2(Y jtminus1 Djt Zjt Ujt bjt

) j = 1 N t = 1

where r1 and r2 are two unknown ky times1 measurable functions

Assumption B4 Suppose that for given t Dt perp (Ut Utminus1 Bt Btminus1

) |Xt Y tminus1

B4 is a sample-level assumption which can be supported by a correspond-ing assumption at the heterogeneous population level Note that in general B4is a strong assumption as we condition on Y tminus1 which is a function of bothDtminus1 and Utminus1 Nevertheless this assumption is plausible under the null of struc-tural noncausality as Y jtminus1 is a constant function of Djtminus1 under the nullOne simple example is that bjt rsquos are constants over j and t and Djt infint=1 perpYjt 0

t=1minusyUjt infint=1X jt infint=1 Under the null of structural noncausality for

all t Y jtminus1 is a function of Yjs0s=1minusy

Ujsts=1X jst

s=1 Then it is easyto show that in this case B4 is satisfied Certainly it will be interesting to relaxB4 but we leave this for future research

PROPOSITION 43 Suppose that B1ndashB4 hold Suppose that Dgt rArrS Ygtand Dgtminus1 rArrS Ygtminus1 forallg isin Gt cupGtminus1 Then Dt perp Yt | Xt Y tminus1

Proposition 43 suggests that when there are time-invariant components in theunobservables we can test for structural noncausality by testing for Gminusnon-causality Dt perp Yt | Xt Y tminus1 under the conditional exogeneity assumption B4

42 Testing for Gminuscausality and structural causality in panel dataAs shown above we can perform a Gminusnon causality test to test for structuralnoncausality Testing Gminusnon-causality in panel data is also simply a conditionalindependence test Here we focus on testing Dt perp Yt | Xt Y tminus1

First assume that the joint distributions of (Dt Yt Xt ) are identical over t In this case we can pool the data and implement a conditional independence testas discussed in Section 36

Second suppose that the joint distributions of (Dt Yt Xt ) are different overtime t In this case for each time period t using the cross-section data we canimplement a conditional independence test for Dt perp Yt | (Xt Y tminus1) and obtain atest statistic sayWt Thus we have T test statistics W1W2 WT and eachone can be used to test for structural noncausality for each t We may also want totest the hypothesis that for all t Djt does not structurally cause Yjt For this wecan construct our test statistic by taking the average of the T test statistics ieW equiv 1

T

sumTt=1Wt as in Im Pesaran and Shin (2003) and Dumitrescu and Hurlin

(2012) Other forms of ldquoaveragesrdquo are possible and special care is needed to takeinto account the dependence structure over time

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 285

5 CONCLUSION

This paper provides direct links between Granger causality and structural causal-ity in cross-section and panel data We extend Granger causality to cross-section and panel data and give a natural definition of structural causality inheterogeneous populations We show that under the key conditional exogeneityassumption Granger causality is essentially equivalent to structural causality incross-section and panel data Similar to the results in White and Lu (2010) fortime-series data our results here should enable researchers to avoid the misuse ofGranger causality and to establish the desired structural causal relation

NOTES

1 Strictly speaking if the conditional distributions are parameterized or treated as infinite-dimensional parameters then distributional heterogeneity may also be thought of as a special caseof structural heterogeneity

2 The classification is certainly overly simplistic We ignore the early literature on the philosophyof causality (for a review see eg Holland 1986 and Hoover 2008) We also ignore the litera-ture on causality discussed in other disciplines notably in machine learning (see eg Pearl 20092015 White and Chalak 2009 and White Chalak and Lu 2011) In particular Heckman and Pinto(2015) compare Haavelmorsquos structural framework of causality with the framework of Directed AcyclicGraphs (DAG) studied in the Bayesian network

3 Although Uj is typically called ldquounobservablerdquo it is better to view its elements as variables thatwill be omitted from empirical analysis This may be because they are unobservable it may also bebecause the researcher has purposefully or inadvertantly neglected them

4 For notational simplicity we suppress the observation subscript i5 Using our notation Holland considers the case where D perp (XU ) which implies our condi-

tional exogeneity assumption D perp U | X below6 See eg Altonji and Matzkin (2005) Hoderlein and Mammen (2007 2009) Imbens and Newey

(2009) White and Lu (2011) and White and Chalak (2013)7 AT T = E(Y (1)minusY (0) | D = 1) = E(U minus0 | D = 1) = E(U ) = 0

8 The conditional density fg(d | x) can also be interpreted as Imbensrsquos (2000) generalized propen-sity score

9 There is a recent literature which allows time-invariant unobservables to be arbitrarily correlatedwith causes of interest in the general nonseparable models (see eg Evdokimov 2010) We leave theproblem of linking G-causality and structural causality in this general setting for future research

REFERENCES

Altonji J amp R Matzkin (2005) Cross section and panel data estimators for nonseparable models withendogenous regressors Econometrica 73 1053ndash1102

Angrist J K Graddy amp G Imbens (2000) The interpretation of instrumental variables estimatorsin simultaneous equations models with an application to the demand for fish Review of EconomicStudies 67 499ndash527

Angrist J O Jorda amp G Kuersteiner (2013) Semiparametric estimates of monetary policy effectsstring theory revisited NBER Working Paper 19355

Angrist J amp G Kuersteiner (2004) Semiparametric causality tests using the policy propensity scoreNBER Working Paper No 10975

Angrist J amp G Kuersteiner (2011) Causal effects of monetary shocks semiparametric conditionalindependence tests with a multinomial propensity score Review of Economics and Statistics 93725ndash747

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

286 XUN LU ET AL

Angrist J amp J Pischke (2009) Mostly Harmless Econometrics An Empiricistrsquos Companion PrincetonUniversity Press

Bester CA amp CB Hansen (2016) Grouped effects estimators in fixed effects models Journal ofEconometrics 190 197ndash208

Bonhomme S amp E Manresa (2015) Grouped patterns of heterogeneity in panel data Econometrica83 1147ndash1184

Browning M amp JM Carro (2007) Heterogeneity and microeconometrics modelling In R BlundellW K Newey and T Persson (Eds) Advances in Economics and Econometrics Theory andApplications Ninth World Congress of the Econometric Society Volume 3 pp 45ndash74 CambridgeUniversity Press

Browning M amp JM Carro (2010) Heterogeneity in dynamic discrete choice models EconometricsJournal 13 1ndash39

Burda M M Harding amp J Hausman (2015) A Bayesian semi-parametric competing risk model withunobserved heterogeneity Journal of Applied Econometrics 30 353ndash376

Chalak K amp H White (2011) An extended class of instrumental variables for the estimation of causaleffects Canadian Journal of Economics 44 1ndash51

Chalak K amp H White (2012) Causality conditional independence and graphical separation insettable systems Neural Computation 24 1611ndash1668

Chamberlain G (1982) The general equivalence of Granger and Sims causality Econometrica 50569ndash581

Chamberlain G (1984) Panel data In Z Griliches amp M Intriligator (eds) Handbook of Economet-rics vol2 ch 22 pp1247ndash1318 Elsevier

Chesher A (2003) Identification in nonseparable models Econometrica 71 1405ndash1441Chesher A (2005) Nonparametric identification under discrete variation Econometrica 73

1525ndash1550Dawid AP (1979) Conditional independence in statistical theory Journal of the Royal Statistical

Society Series B 41 1ndash31Deb P amp PK Trivedi (2013) Finite mixture for panels with fixed effects Journal of Econometric

Methods 2 35ndash51Delgado M amp W Gonzalez-Manteiga (2001) Significance testing in nonparametric regression based

on the bootstrap Annals of Statistics 29 1469ndash1507Dumitrescu E-I amp C Hurlin (2012) Testing for Granger non-causality in heterogeneous panels

Economic Modelling 29 1450ndash1460Evdokimov K (2010) Identification and estimation of a nonparametric panel data model with unob-

served heterogeneity Working paper Department of Economics Princeton UniversityFlorens JP (2003) Some technical issues in defining causality Journal of Econometrics 112

127ndash128Florens JP amp D Fougere (1996) Non-causality in continuous time Econometrica 64 1195ndash1212Florens JP amp M Mouchart (1982) A note on non-causality Econometrica 50 583ndash591Granger CWJ (1969) Investigating causal relations by econometric models and cross-spectral

methods Econometrica 37 424ndash438Granger CWJ (1980) Testing for causality a personal viewpoint Journal of Economic Dynamics

and Control 2 329ndash352Granger CWJ (1988) Some recent developments in a concept of causality Journal of Econometrics

39 199ndash211Granger CWJ amp P Newbold (1986) Forecasting Economic Time Series Academic Press 2nd

editionHausman J amp T Woutersen (2014) Estimating a semi-parametric duration model without specifying

heterogeneity Journal of Econometrics 178 114ndash131Haavelmo T (1943) The statistical implications of a system of simultaneous equations Econometrica

11 1ndash12Haavelmo T (1944) The probability approach in econometrics Econometrica 12 (supp) iii-vi

1ndash115

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 287

Hahn J (1998) On the role of the propensity score in efficient semiparametric estimation of averagetreatment effect Econometrica 66 315ndash331

Hahn J amp R Moon (2010) Panel data models with finite number of multiple equilibria EconometricTheory 26 863ndash881

Heckman JJ (2000) Causal parameters and policy analysis in economics a twentieth century retro-spective Quarterly Journal of Economics 115 45ndash97

Heckman JJ (2008) Econometric causality International Statistical Review 76 1ndash27Heckman JJ amp R Pinto (2015) Causal analysis after Haavelmo Econometric Theory 31

115ndash151Hirano K G Imbens amp G Ridder (2003) Efficient estimation of average treatment effects using the

estimated propensity score Econometrica 71 1161ndash1189Hoderlein S amp E Mammen (2007) Identification of marginal effects in nonseparable models without

monotonicity Econometrica 75 1513ndash1519Hoderlein S amp E Mammen (2009) Identification and estimation of local average derivatives in non-

separable models without monotonicity Econometrics Journal 12 1ndash25Hoderlein S amp H White (2012) Nonparametric identification in nonseparable panel data models with

generalized fixed effects Journal of Econometrics 168 300ndash314Holland P (1986) Statistics and causal inference Journal of the American Statistical Association 81

945ndash970Holtz-Eakin D W Newey amp HS Rosen (1988) Estimating vector autoregressions with panel data

Econometrica 56 1371ndash1396Hoover K (2008) Causality in economics and econometrics In S Durlauf and L Blume (eds)

The New Palgrave Dictionary of Economics 2nd edition Palgrave MacmillanHsiao C (2014) Analysis of Panel Data Cambridge University PressHuang M Y Sun amp H White (2016) A flexible nonparametric test for conditional independence

Econometric Theory forthcomingHurwicz L (1950) Generalization of the concept of identification In TC Koopmans (eds) Statistical

Inference in Dynamic Economic Models pp 238ndash257 John WileyIm K H Pesaran amp Y Shin (2003) Testing for unit roots in heterogeneous panels Journal of Econo-

metrics 115 53ndash74Imbens G (2000) The role of the propensity score in estimating dose-response functions Biometrika

87 706ndash710Imbens G (2004) Nonparametric estimation of average treatment effects under exogeneity a review

Review of Economics and Statistics 86 4ndash29Imbens G amp W Newey (2009) Identification and estimation of triangular simultaneous equations

models without additivity Econometrica 77 1481ndash1512Imbens G amp J Wooldridge (2009) Recent developments in the econometrics of program evaluation

Journal of Economic Literature 47 5ndash86Kasy M (2011) Identification in triangular systems using control functions Econometric Theory 27

663ndash671Koopmans T (1950) Statistical Inference in Dynamic Economic Models Cowles Commission Mono-

graph No 10 WileyKuersteiner G (2008) GrangerndashSims causality In S Durlauf and L Blume (Eds) The New Palgrave

Dictionary of Economics 2nd edition Palgrave MacmillanLechner M (2001) Identification and estimation of causal effect of multiple treatments under con-

ditional independence assumption In M Lechner F Pfeiller (eds) Econometric Evaluations ofLabour Market Policies Hildelberg Physical 43ndash58

Lechner M (2011) The relation of different concepts of causality used in time series and microecono-metrics Econometric Reviews 30 109ndash127

Lin C-C amp S Ng (2012) Estimation of panel data models with parameter heterogeneity when groupmembership is unknown Journal of Econometric Methods 1 42ndash55

Lindsay B (1995) Mixture Models Theory Geometry and Applications Hayward Institute for Math-ematical Statistics

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

288 XUN LU ET AL

Linton O amp P Gozalo (2014) Testing conditional independence restrictions Econometric Reviews33 523ndash552

Lu X amp L Su (2014) Determining the number of groups in latent panel structures Working paperSchool of Economics Singapore Management University

Matzkin R (2003) Nonparametric estimation of nonadditive random functions Econometrica 711339ndash1375

Matzkin R (2007) Nonparametric identification In JJ Heckman and EE Leamer (eds) The Hand-book of Econometrics 6B North-Holland

Mclachlan G amp K Basford (1988) Mixture Models Inference and Applications to Clustering MarcelDekker

Nair-Reichert U amp D Weinhold (2001) Causality tests for cross-country panels a new look at FDIand economic growth in developing countries Oxford Bulletin of Economics and Statistics 63153-171

Pearl J (2009) Causality Models Reasoning and Inference (2nd edition) Cambridge UniversityPress

Pearl J (2015) Trygve Haavelmo and the emergence of causal calculus Econometric Theory 31152ndash179

Rosenbaum P amp D Rubin (1983) The central role of the propensity score in observational studies forcausal effects Biometrika 70 41ndash55

Rubin D (1974) Estimating causal effects of treatments in randomized and non-randomized studiesJournal of Educational Psychology 66 688ndash701

Rubin D (2004) Direct and indirect causal effects via potential outcomes Scandinavian Journal ofStatistics 31 161ndash170

Sarafidis V amp N Weber (2015) A partially heterogeneous framework for analyzing panel data OxfordBulletin of Economics and Statistics 77 274ndash296

Sims C (1972) Money income and causality American Economic Review 62 540ndash552Song K (2009) Testing conditional independence via Rosenblatt transforms Annals of Statistics 37

4011ndash4015Strotz R amp H Wold (1960) Recursive vs nonrecursive systems An attempt at synthesis Economet-

rica 28 417ndash427Su L Z Shi amp PCB Phillips (2014) Identifying latent structures in panel data Working paper

Department of Economics Yale UniversitySu L amp H White (2007) A Consistent characteristic function-based test for conditional indepen-

dence Journal of Econometrics 141 807ndash834Su L amp H White (2008) A nonparametric Hellinger metric test for conditional independence Econo-

metric Theory 24 829ndash864Su L amp H White (2014) Testing conditional independence via empirical likelihood Journal of

Econometrics 182 27ndash44Sun Y (2005) Estimation and inference in panel structure models Working paper Department of

Economics UCSDWhite H amp K Chalak (2009) Settable systems An extension of Pearlrsquos causal model with optimiza-

tion equilibrium and learning Journal of Machine Learning Research 10 1759ndash1799White H amp K Chalak (2010) Testing a conditional form of exogeneity Economics Letters 109

88ndash90White H amp K Chalak (2013) Identification and identification failure for treatment effects using

structural systems Econometric Reviews 32 273ndash317White H K Chalak amp X Lu (2011) Linking Granger causality and the Pearl causal model with

settable systems Journal of Machine Learning Research Workshop and Conference Proceedings12 1ndash29

White H amp P Kennedy (2009) Retrospective estimation of causal effects through time In J Castleamp N Shephard (eds) The Methodology and Practice of Econometrics A Festschrift in Honour ofDavid F Hendry pp 59ndash87 Oxford University Press

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 289

White H amp X Lu (2010) Granger causality and dynamic structural systems Journal of FinancialEconometrics 8 193ndash243

White H amp X Lu (2011) Causal diagrams for treatment effect estimation with application to selectionof efficient covariates Review of Economics and Statistics 93 1453ndash1459

Wooldridge J (2005) Simple solutions to the initial conditions problem for dynamic nonlinear paneldata models with unobserved heterogeneity Journal of Applied Econometrics 20 39ndash54

Zellner A (1979) Causality and econometrics In K Brunner and AH Meltzer (eds) Three Aspectsof Policy and Policymaking Carnegie-Rochester Conference Series Vol 10 North-Holland 9ndash54

APPENDIX

Proof of Theorem 31 To show D perp U | X we establish that f (u | d x) = f (u | x)for all (ud x) We have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

= f (u x)

f (x)= f (u | x)

where the first equality in the second line holds since A2(i) (ie Dg | Xg sim D | X)implies fg(d x) = f (d | x) fg(x) say and the second holds by A2(i i) fg(u | d x) =fg(u | x) n

Proof of Theorem 32 Note that

f (u | x) = f (u x)

f (x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

We also have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | d x) fg(x)sum

gisinG pg fg(x)

The first equality in the second line follows from A2(i) Then f (u | d x) = f (u | x) forall (ud x) if and only ifsumgisinG

pg fg(u | d x) fg(x) =sumgisinG

pg fg(u | x) fg(x) for all (ud x)

By assumption (Ug | Dg Xg) g isinG is regular It follows immediately that for all g isinGDg perp Ug | Xg n

Proof of Proposition 33 Take any g isin G and let j belong to group Jg Dg rArrSYg means that Yj = r(Zj Uj bj ) Thus Uj perp Dj | X j implies that Yj perp Dj | X j byDawid (1979 Lemmas 41 and 42) This holds for all g isin G Applying Theorem 31gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

290 XUN LU ET AL

Proof of Theorem 34 We prove this by showing that Y perp D | X implies that Dg rArrSYg for all g isinG Given the assumed regularity Theorem 32 ensures that Y perp D | X impliesthat Yg perp Dg | Xg for all g isin G

Given (i) eq(33) and A2(ii) imply that for all j in group JgE

(Yj |X j = x Dj = d

) = r1(d z bj

)+ E[r2

(zUj bj

) | Dj = d X j = x]

= r1(d z bj

)+ E[r2

(zUj bj

) | X j = x]

Clearly r1(d z bj

)is constant in d if and only if E

(Yj |Dj = d X j = x

)is constant in

d Yj perp Dj | X j implies that E(Yj |Dj = d X j = x

)is constant in d which then implies

that r1(d z bj

)is constant in d Thus Dj rArrS Yj so Dg rArrS Yg

Given (i i) let ky = 1 without loss of generality Then by A2(ii) for all j in group JgFUj (u | x) = E

[1Uj le u

∣∣ X j = x]

= E[

1r(d zUj bj

) le r(d zu bj

)∣∣ X j = x]

= E[

1r(Dj Zj Uj bj

) le r(d zu bj

)∣∣ Dj = d X j = x]

= E[

1Yj le r

(d zu bj

)∣∣ Dj = d X j = x]

= Fj(r(d zu bj

) | d x)

By strict monotonicity of Fj (y | d x) in y

r(d zu bj

) = Fminus1j

(FUj (u | x) | d x

)

Now Yj perp Dj | X j implies that Fminus1j

(FUj (u | x) | d x

)is constant in d so r

(d zu bj

)is

also a constant function in d ie Dj rArrS Yj so Dg rArrS Yg As either (i) or (i i) holds for each g isin G we have Dg rArrS Yg for all g isin G n

Proof of Theorem 35 To show that A1 - A2 and the assumed regularity for Ygtogether with Dg rArrS(Xg) Yg for some g isin G imply D perp Y | X we let j in group Jgg isin G be such that Dj rArrS(X j ) Yj A2(i i) ensures that Dj perp Uj | X j so

P[Yj le y | Dj = d X j = x] =int

1r(d zu bj ) le y

d Fj (u | d x)

=int

1r(d zu bj ) le y

d Fj (u | x)

By assumption Dj rArrS(X j ) Yj so there exists ylowast isin supp(Yj ) such that there is no mea-

surable mapping f jylowast for whichint

1r(d zu bj ) le ylowastd Fj (u | d x) = f jylowast(x) ae-x Specifically this rules out the possibility thatint

1r(d zu bj ) le ylowast

d Fj (u | X j ) = f jylowast(X j ) equiv P[Yj le ylowast | X j ] as

Thus there exists ylowast isin supp(Yj ) such that

P[Yj le ylowast | Dj = d X j = x

] = P[Yj le ylowast | X j = x

]

for x in a set of positive probability so Yj perp Dj | X j Given the regularity assumed for YgTheorem 32 (for Yg) gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 291

Proof of Theorem 41 The proof is analogous to that of Theorem 31 n

Proof of Proposition 42 The proof is analogous to that of Proposition 33 n

Proof of Proposition 43 Under B3 Dgt rArrS Ygt and Dgtminus1 rArrS Ygtminus1 forallg isinGt cupGtminus1 imply that for j isin Jg there exist two measurable functions r1 and r2 such that

Yjt = r1(Uj0 bj0)+ r2(Y jtminus1 Zjt Ujt bjt

) j = 1 N t = 1

Hence

Yjt minusYjtminus1 = r2(Y jtminus1 Zjt Ujt bjt

)minus r2(Y jtminus2 Zjtminus1Ujtminus1 bjtminus1

)

At the sample level this means that

Yt minusYtminus1 = r2(Y tminus1 Zt Ut Bt

)minus r2(Y tminus2 Ztminus1Utminus1 Btminus1

)

Then B4 implies that

Dt perp (Xt Y tminus1Ut Utminus1 Bt Btminus1) | Xt Y tminus1

which further implies that Dt perp (Yt minusYtminus1) | Xt Y tminus1by Dawid (1979 Lemmas 41 and42) as (Yt minus Ytminus1) is a function of (Xt Y tminus1Ut Utminus1 Bt Btminus1) Then we have Dt perpYt | Xt Y tminus1 as Ytminus1 is a component of Y tminus1 n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

Page 3: 33 GRANGER CAUSALITY AND STRUCTURAL CAUSALITY IN … Lu_Su... · 2017. 2. 7. · 264 XUNLUETAL. natural experiments. The goal of this paper is to establish the analogous equiv-alence

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 265

Second as in the time-series context we give a natural definition of struc-tural causality in cross-section and panel data We distinguish the structuralcausality from various average causal effects We show that given the con-ditional form of exogeneity structural noncausality implies GminusnoncausalityIf we further assume monotonicity or separability in the structural equationsstructural causality implies Gminuscausality In the case where we do not assumemonotonicity or separability we strengthen structural causality to structuralcausality with positive probability (wpp) and show that structural causalitywpp implies Gminuscausality These results justify using tests of Gminusnon-causalityto test for structural noncausality We emphasize that appropriately choosingcovariates that ensure the conditional exogeneity assumption is the key toendowing Gminusnon-causality with a structural interpretation For example weshow that both leads and lags can be appropriate covariates in the panel datasetting

Third we establish the linkage between population-group conditional exogene-ity and its sample analogue in a heterogeneous population where the latter formsthe basis for linking Gminuscausality and structural causality We show that with-out the conditional exogeneity at the sample level the derivative of conditionalexpectation can be decomposed into three parts a weighted average marginaleffect a bias term due to endogeneity and a bias term due to heterogeneity Thusas emphasized in the literature (see eg Angrist and Kuersteiner 2004 2011)without such assumptions as conditional exogeneity it is impossible to give theGminusnon-causality test a causal interpretation We show that conditional exogene-ity ensures that the two bias terms vanish in which case certain average subgroupcausal effects with mixing weights are identified

The plan of the paper is as follows In Section 2 we provide a review onvarious concepts of causality In Section 3 we first specify the cross-sectionheterogeneous population DGP and the sampling scheme We define the cross-section structural causality and static Gminuscausality and establish the equivalenceof Gminuscausality and structural causality under certain conditional exogeneityconditions Testing for Gminuscausality and structural causality is also discussedIn Section 4 we consider structural causality and Gminuscausality in panel data Wefocus on dynamic panel structures and derive some testable hypotheses Section5 concludes All the mathematical proofs are gathered in the Appendix

2 LITERATURE REVIEW

This paper builds on the vast literature on causality For reviews on causality ineconometrics see Zellner (1979) Heckman (2000 2008) Imbens (2004) Hoover(2008) Kuersteiner (2008) Angrist and Pischke (2009) Imbens and Wooldridge(2009) among others Broadly speaking we can divide the literature into twocategories The first includes Gminuscausality and Sims causality The second per-tains to the causality defined on structural equations and that defined on potentialoutcomes2

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

266 XUN LU ET AL

21 Gminuscausality and Sims causalityGminuscausality and Sims causality were originally proposed to study time seriesdata Granger (1969) defines Gminuscausality ldquoin meanrdquo based on conditional expec-tations while Granger (1980) and Granger and Newbold (1986) generalize it toGminuscausality ldquoin distributionrdquo based on conditional distributions In this paper wefocus on Gminuscausality ldquoin distributionrdquo which we simply refer to as GminuscausalityTo define it we first introduce some notation For any sequence of random vectorsYt t = 01 we let Y t equiv (Y0 Yt ) denote its ldquotminushistoryrdquo and let σ(Y t )denote the sigma-field generated by Y t Let Dt Yt Xt be a sequence of ran-dom vectors where Dt is the cause of interest Yt is the response of interestand Xt is some variates Granger and Newbold (1986) say that Dtminus1 does notGminuscause Yt+k with respect to σ(Y tminus1 Xtminus1) if for all t = 12

Ft+k

(middot ∣∣ Dtminus1Y tminus1 Xtminus1

)= Ft+k

(middot ∣∣ Y tminus1 Xtminus1

) k = 01 (21)

where Ft+k( middot | Dtminus1Y tminus1 Xtminus1

)denotes the conditional distribution function

of Yt+k given(Dtminus1 Y tminus1 Xtminus1

) and Ft+k

( middot | Y tminus1 Xtminus1)

denotes that of Yt+k

given(Y tminus1 Xtminus1

) As Florens and Mouchart (1982) and Florens and Fougere

(1996) have noted eq(21) is equivalent to the conditional independence relation

Yt+k perp Dtminus1 |Y tminus1 Xtminus1 (22)

where we use X perp Y | Z to denote that X and Y are independent given Z If

E(

Yt+k∣∣Dtminus1Y tminus1 Xtminus1

)= E

(Yt+k

∣∣Y tminus1 Xtminus1)

as (23)

we say that Dtminus1 does not Gminuscause Yt+k with respect to σ(Y tminus1 Xtminus1

)ldquoin meanrdquo Eq(23) is a conditional mean independence statement In the litera-ture most of the discussion on Gminuscausality focuses on k = 0 (see eg Granger1980 1988 and Kuersteiner 2008) The definition of Gminuscausality does notrely on any economic theory or structural assumptions Granger (1969 p 430)emphasizes that ldquoThe definition of causality used above is based entirely on thepredictability of some seriesrdquo Therefore Gminuscausality does not necessarily revealany underlying causal relation and it is entirely unfounded to draw any structuralor policy conclusions from the Gminuscausality tests

ldquoSims noncausalityrdquo was originally defined in Sims (1972) Florens (2003) andAngrist and Kuersteiner (2004 2011) give a generalized definition

Y infint perp Dtminus1 |Dtminus2Y tminus1 Xtminus1 t = 1

where Y infint = (Yt Yt+1 ) Chamberlain (1982) and Florens and Mouchart (1982)

show that under some mild regularity conditions Gminusnoncausality with k = 0 andSims noncausality are equivalent when the covariates Xt are absent When thecovariates Xt are present however Gminusnon-causality and Sims noncausality

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 267

are in general not equivalent For an excellent review on the relationship betweenGminusnoncausality and Sims noncausality see Kuersteiner (2008) Similar toGminusnoncausality Sims noncausality is completely based on predictability and hasno structural interpretation

22 Structural causality and causality in the potential outcomeframework

On the other hand causality defined on structural equations or potential outcomesis mostly discussed in cross-section data The structural equation approach canbe traced back to the work of the Cowles Commission in the 1940s (see egHaavelmo 1943 1944 and Koopmans 1950) though most of their work wasbased on linear equations More recently researchers have generalized linearequations to nonseparable and nonparametric equations (see eg Chesher 20032005 Matzkin 2003 2007 and Altonji and Matzkin 2005) In 2015 Economet-ric Theory published a special issue in memory of Trygve Haavelmo (Volume 31Issue 01 and 02) which contains many interesting discussions on recent develop-ments of causality related to Haavelmorsquos structural models

Consider a simple case where the response of interest is Yi and the cause ofinterest is Di The causal relation between Di and Yi can be characterized by anunknown structural equation r such that

Yi = r(Di Ui )

where Ui is other unobservable causes of Yi For example when Yi is the demandDi is the price and Ui represents certain demand shocks r is the demand functionthat is derived from utility maximization r is a general function and does notneed to be linear parametric or separable between observable causes Di andunobservable causes Ui Here r has a structural or causal meaning and we definethe causal effect of Di on Yi based on r Let D and U be the support of Di andUi respectively If r (du) is a constant function of d for all d isin D and all u isin Uthen we simply say that Di does not structurally cause Yi (see eg Heckman2008 and White and Chalak 2009) For a binary Di the effect of Di on Yi isr(1u) minus r(0u) when Ui = u The effect can depend on unobservable u thusunobservable heterogeneity is allowed For a continuous Di the marginal effectof Di on Yi is partr(du)partd when Di = d and Ui = u To identify the effects of Di

on Yi we often impose the assumption of conditional exogeneity

Di perp Ui | Xi

where Xi is some observable covariates This includes the special case where Di

and Ui are independent corresponding to Di being randomizedHalbert White has made substantial contribution on defining identifying and

estimating causal effects in structural equations White and Chalak (2009) ex-tend Judea Pearlrsquos causal model to a settable system which incorporates features

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

268 XUN LU ET AL

of central interest to economists and econometricians optimization equilibriumand learning Roughly speaking a settable system is ldquoa mathematical frameworkdescribing an environment in which multiple agents interact under uncertaintyrdquo(White and Chalak 2009 p 1760) In the settable system a variable of interesthas two roles ldquoresponsesrdquo and ldquosettingsrdquo When the value of the variable is deter-mined by the structural equation these values are called ldquoresponsesrdquo In contrastwhen the value is not determined by the structural equation but is instead set toone of its admissible values these values are called ldquosettingsrdquo They show thaton the settable system causes and effects can be rigorously defined Chalak andWhite (2012) provide definitions of direct indirect and total causality on the set-table system in terms of functional dependence and show how causal relationsand conditional independence are connected Chalak and White (2011) providean exhaustive characterization of potentially identifying conditional exogeneityrelationships in liner structural equations and introduce conditioning and condi-tional extended instrumental variables to identify causal effects White and Chalak(2013) provide a detailed discussion on the identification and identification failureof causal effects in structural equations White and Chalak (2010) discuss how totest the conditional exogeneity assumption

The treatment effect literature adopts the potential outcome framework (seeeg Rubin 1974 Rosenbaum and Rubin 1983 and Holland 1986) For a causeof interest Di and response of interest Yi we define a collection of potential out-comes Yi (d) d isin D where D is the support of Di Then we can define causalor treatment effect based on the potential outcomes Specifically for two valuesd and dlowast in D we define Yi (dlowast)minus Yi (d) as the causal effects of Di on Yi whenDi changes from d to dlowast If Yi (d) = Yi (dlowast) for all d and dlowast isin D we say thatDi has no causal effects on Yi For example when Di is binary ie D = 01Yi (0) and Yi (1) are the potential outcomes corresponding Di = 0 and Di = 1respectively The treatment or causal effect is Yi (1)minus Yi (0) Two average effectsoften discussed in the literature are

the average treatment effect (ATE) E(Yi (1)minusYi (0))

and

the average treatment effect on the treated (ATT) E(Yi (1)minusYi (0) | Di = 1)

Note that we only observe one outcome in data For example when Di is binarythe observed outcome is simply Yi = Di Yi (1) + (1 minus Di )Yi (0) To identify theeffects we often impose the assumption of unconfoundedness or selection onobservables

Yi (d) perp Di | Xi

where Xi is observable covariates Lechner (2001) Imbens (2004) Angrist andPischke (2009) and Imbens and Wooldridge (2009) provide reviews on identi-fication and estimation of various effects in the potential outcome framework

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 269

White and Chalak (2013 p 280) show that the structural equation framework isequivalent to the potential outcome framework For example the potential out-come Yi (d) is simply r(dUi ) The unconfoundedness assumption Yi (d) perp Di |Xi in the potential outcome framework is equivalent to the conditional exogeneityassumption Ui perp Di | Xi in the structural equation framework

23 Relationship between these two types of causality

We emphasize that Gminuscausality and Sims causality are entirely based on pre-dictability while the causality defined on structural equations or potential out-comes is a real causal relation In the literature there are several papers thatlink these two types of causality White and Lu (2011) provide a direct linkbetween Gminuscausality and structural causality in structural equations They showthat given the conditional exogeneity assumption structural noncausality isessentially equivalent to Gminusnoncausality Angrist and Kuersteiner (2004 2011)and Kuersteiner (2008) discuss the link between Sims noncausality and thenoncausality defined in the potential outcome framework under the assumption ofselection on observables Angrist Jorda and Kuersteiner (2013) study the mon-etary policy effects in the potential outcome framework Lechner (2011) linksGrangerSims noncausality to various average effects defined in the potentialoutcome framework However all the discussions so far have been made in thecontext of time series data

3 STRUCTURAL CAUSALITY AND GminusCAUSALITY INCROSS-SECTION DATA

31 Structural causality

We first specify a population DGP for a single time period omitting the timeindex We write N+ =12 and N = 0cupN+ We also write N+ = N+ cupinfin and N = 0cup N+

Assumption A1 (Cross-Section Heterogeneous Population DGP) Let(F P) be a complete probability space let N isin N+ and for the membersof a population j isin 1 N let the random vector Yj be structurally determinedby a triangular system as

Yj = r(Dj Zj Uj bj

) (31)

where r is an unknown measurable ky times 1 function ky isin N+ Dj Zj and Uj j = 12 are vectors of nondegenerate random variables on (F P) havingdimensions kd isin N+ kz isin N and ku isin N respectively and bj is a nonrandomreal vector of dimension kb isin N Suppose also that Wj is a random vector on(F P) with dimension kw isin N The triangular structure is such that Yj doesnot structurally determine Dj Zj and Uj and neither Yj nor Dj structurallydetermines Wj

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

270 XUN LU ET AL

To keep causal concepts clear it is important to distinguish between the popu-lation DGP defined above and the sample DGP defined later For now we leavesampling aside We may also refer to members of the population as ldquounitsrdquo orldquoindividualsrdquo and the distribution of the population as a mixture distributionMixture distributions appear in many contexts in the literature and arise natu-rally where a statistical population contains two or more subpopulations (see egLindsay 1995 and Mclachlan and Basford 1988) As will be clear in a momentthe sources of population heterogeneity are in general not observed

We interpret Yj as the response of interest It is determined by variables Dj Zj and Uj and the constants bj The variables can be binary categorical orcontinuous Immediately below we formalize a natural notion of causality for thissystem For now it is heuristically appropriate to view Dj as observable causes ofinterest (eg a treatment) Zj as other observable causes not of primary interestand Uj as unobservable causes3

The structural function r is unknown and our goal will be to learn about itfrom a sample of the population Nevertheless when Wj has positive dimensionr embodies the a priori exclusion restriction that Wj does not determine Yj Thetypical sources of this restriction as well as the identities of Dj Zj and Uj theirstatus as observable or not and the priority or precedence relations embodied inthe assumed triangularity are economic theory and specific domain knowledge

The assumed triangularity rules out explicit simultaneity for succinctness andclarity For the causality in simultaneous structural equations there are differentviews in the literature White and Chalak (2009 2013) follow Strotz and Wold(1960) and argue that simultaneous equations are not causal structural relationsbut are instead ldquomutual consistency conditions holding between distinct sets ofstructural equations ndash for example between one set of structural equations govern-ing partial equilibrium and another governing full equilibriumrdquo However otherresearchers argue that simultaneous structural equations can be given a causalinterpretation For example Angrist Graddy and Imbens (2000) provide a poten-tial outcome interpretation for general nonlinear simultaneous equation modelsThey also show that the standard linear instrumental variable estimator identifiesa weighted average of the causal effects

By definition the constants bj are fixed for a given individual Neverthelessthey may vary across individuals we call this variation structural heterogeneityIf the bj rsquos are identical we write them as b0 We assume that bj rsquos are unknownNote that the presence of bj facilitates writing r without a j subscript as differ-ences in the structural relations across population members can be accommodatedby variations in the possibly infinite-dimensional bj

To give a definition of causality for the structural system in Assumption A1 letDj = supp(Dj ) denote the support of Dj ie the smallest closed set containingDj with probability 1

DEFINITION 31 (Cross-Section Structural Causality) Let j be given If thefunction r(middot zu bj ) Dj rarr R

ky is constant on Dj for all admissible z and u

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 271

then Dj does not structurally cause Yj and we write Dj rArrS Yj Otherwise Dj

structurally causes Yj and we write Dj rArrS Yj

Here we implicitly assume that there is variation in potential cause Dj (ie Dj

is nondegenerated) and under the set of counterfactual policies Dj the structuralfunction r is invariant Therefore the invariant function r fully characterizes thecausalstructural relationship between Yj and (Dj Zj Uj )

Structural causality is structural functional dependence in a specific contextSimilar definitions can be given for Zj and Uj but we leave these implicitas these are not the main causes of interest Causality for components of Dj

is defined in an obvious way If we define Yj (d) = r(d Zj Uj bj

)as the

potential outcomes (see eg Rubin 1974 Rosenbaum and Rubin 1983 andHolland 1986) then the equivalent definition of structural noncausality is thatYj (d) = Yj (dlowast) as for all (ddlowast) isin Dj

Structural causality can be easily understood in the familiar linear structure forscalar Yj

Yj = bj0 + Dprimej bj1 + Z prime

j bj2 +Uj

where bj = (bj0bprimej1bprime

j2)prime If bj1 = 0 then Dj does not structurally cause Yj

Otherwise it does This is so natural and intuitive that one might wonder whycausality is such a thorny topic One main reason for confusion arises from therelation between causality and simultaneity as discussed above the triangularsystems considered here obviate this issue Another main reason for confusionis the failure to distinguish carefully between structural equations of the sort writ-ten above and regression equations which may look similar but need not havestructural content The equation above is entirely structural We will encounterregressions (conditional expectations) only after suitable structural foundationsare in place

Observe that heterogeneity in bj rsquos permits Dj to cause Yj for some j and notfor others

The nondegeneracy of Dj ensures that Dj contains at least two points so Dj

is a variable rather than a constant Variation in potential causes is fundamentalfor defining causality (cf Holland 1986) this is what makes possible analogousdefinitions of causality for Zj and Uj Significantly however because bj rsquos arefixed constants they cannot be causes of Yj Instead bj rsquos can be effects or candetermine effects This follows from the following formal definition

DEFINITION 32 (Intervention and Effect) Let j be given and let d and dlowastbe distinct admissible values for Dj Then d rarr dlowast is an intervention to Dj Let(zu) be admissible values for (Zj Uj ) The difference ylowast

j minus yj = r(dlowast zu bj

)minusr(d zu bj

)is the effect on Yj of the intervention d rarr dlowast to Dj at (zu)

This is also referred to as the ldquocausal effectrdquo in the treatment effect literature(see eg Rubin 1974) For the linear case with scalar d the effect of a one-unit

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

272 XUN LU ET AL

intervention d rarr d +1 is

ylowastj minus yj = (

bj0 + (d +1)bj1 + zprimebj2 +u)minus (

bj0 +d bj1 + zprimebj2 +u) = bj1

For unit j this effect is fixed ie it is the same constant for all (zu) Becausebj can differ across units this permits effect heterogeneity

Alternatively if Yj = bj0 + Zj Dprimej bj1 +Uj with scalar Zj then the effect of

d rarr dlowast is

ylowastj minus yj = (

bj0 + zdlowastprimebj1 +u)minus (

bj0 + zd primebj1 +u) = z

(dlowast minusd

)primebj1

Here the effect of Dj on Yj depends on z but not u In this case bj1 determinesthe effect of Dj on Yj together with Zj When an effect depends on a variable(ie an element of Zj or Uj ) it is standard (especially in the epidemiologicalliterature) to call that variable an effect modifier

For simplicity we call ylowastj minus yj the effect of d rarr dlowast Chalak and White (2012)

discuss indirect direct and total effects in structural equations In the potentialoutcome framework Rubin (2004) discusses ldquodirectrdquo and ldquoindirectrdquo casual effectusing principal stratification

We distinguish the structural causal effect from various average causal effectssuch as E[r(dlowast Zj Uj bj ) minus r(d Zj Uj bj )] It is clear that structural non-causality implies zero average causal effects while the converse is not necessarilytrue

To describe identified effects in samples from heterogeneous populationswe define population groups Jg g = 1 as collections of populationunits having identical bj and identical distributions of (Dj Uj ) | X j whereX j = (Z prime

j W primej )

prime We define G = 1 As the population is finite so is thenumber of groups le N We define Ng = j isinJg where middot is the cardinal-ity of the indicated set and let pg = NgN be the proportion of population unitsbelonging to group Jg The bj rsquos by themselves need not define groups as thedistributions of (Dj Uj ) | X j may differ for units with identical bj We call cross-group variation in the distributions of (Dj Uj ) | X j distributional heterogeneityto distinguish this from structural heterogeneity that refers to cross-group varia-tion in bj

Note that because groups are defined by unobservable constants bj and dis-tributions involving the unobservable Uj in general we do not know for surewhether two units belong to the same group Interestingly in the related litera-ture Bester and Hansen (2016) consider estimating grouped effects in panel datamodels when each individualrsquos group identity is known and Su Shi and Phillips(2014) consider identifying latent group structures in panel data models via a vari-ant of Lasso Nevertheless both groups of researchers have only focused on thestructural heterogeneity in the panel framework

Given A1 it follows that (Yj Dj ) | X j is identically distributed for all unitsj in group Jg As a convenient shorthand for j isin Jg we write (Yj Dj ) | X j sim(Yg Dg) | Xg where A sim B denotes that A is distributed as B

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 273

Typically we do not observe an entire population Instead we observe observa-tions sampled from the population in some way For simplicity and concretenesswe consider simple random sampling At the sample level we write4

Y = r (D Z U B)

where B is the randomly sampled bj Note that the randomness of B arises solelyfrom the sampling process We write X = (Z primeW prime)prime and distinguish X from thecauses D by calling X a vector of covariates

For any random vector X we let f (x) and F (x) denote the joint prob-ability density function (PDF) and cumulative distribution function (CDF)respectively For any two random vectors X and Y we use f (y|x) and F (y|x)to denote the conditional PDF and CDF of Y given X = x respectively Wealso use subscript j and g to denote individual j and group g respectivelyFor example Fg(ud x) and fg (u|d x) denotes the CDF of (Ug Dg Xg)and the conditional PDF of Ug given Dg Xg for group g respectively Notethat Fg(ud x) = Nminus1

g

sumjisinJg Fj (ud x) which defines the mixture CDF of

(Ug Dg Xg)

32 Gminuscausality a first encounterWe now define Gminuscausality for cross-section data based on Grangerrsquos philoso-phy of nonpredictability Holland (1986 p 957) considers a special case whereD is randomized and applies Gminuscausality to cross-section data5 As in Hollandwe define Gminusnon-causality as a conditional independent statement FollowingDawid (1979) we write X perp Y | Z to denote that X and Y are independentgiven Z and X perp Y | Z if X and Y are not independent given Z Translat-ing Granger and Newboldrsquos (1986 p 221) definition to the cross-section contextgives

DEFINITION 33 (GminusCausality) Let Y Q and S be random vectors IfQ perp Y | S then Q does not Gminuscause Y with respect to (wrt) S OtherwiseQ Gminuscauses Y wrt S

Here Y Q and S can be any random vectors and this notion has no structuralcontent When S is a constant we have the simplest form of Gminusnon-causalityindependence Correlation is an example of Gminuscausality

To give Gminuscausality structural meaning we impose A1 assume that D obeysa suitable exogeneity condition and take Q = D and S = X The exogeneityassumed for D ensures identification for various measures of its effects and canoften be structurally justified It also turns out to be ideally suited to endowingGminuscausality with structural meaning

To see how this works in a simple setting consider the homogeneous case withD being a binary scalar in which average treatment effects (AT E) and averagetreatment effects on the treated (AT T ) are often discussed (see eg Rubin 1974

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

274 XUN LU ET AL

Hahn 1998 Hirano Imbens and Ridder 2003 and Angrist and Pischke 2009)Define the covariate-conditioned effect of treatment on the treated as

AT TX = E (Y (1)minusY (0) | D = 1 X)

where Y (1) is the potential response to treatment and Y (0) is the potentialresponse in the absence of treatment Using A1 we write Y (1) = r(1 Z U b0)and Y (0) = r(0 Z U b0) AT TX thus has a clear structural interpretation Fromthis we can construct AT T

AT T = E (Y (1)minusY (0) | D = 1) = E(AT TX | D = 1)

Note that here we define AT T and AT TX based on our structural equation r Nevertheless they can be defined without a structural model ie based on thetwo potential outcomes Y (0) and Y (1) (see eg Rubin 1974 and Angrist andPischke 2009)

To identify AT TX it suffices that D perp Y (0) | X Given A1 it suffices forthis that D perp U | X as is readily verified This conditional exogeneity is acommon identifying assumption in cross-sections6 Classical strict exogeneity((D Z) perp U ) is sufficient but not necessary for this Using D perp U | X in thesecond line below we have

AT TX = E [r(1 Z U b0) | D = 1 X ]minus E [r(0 Z U b0) | D = 1 X ]

= E [r(1 Z U b0) | D = 1 X ]minus E [r(0 Z U b0) | D = 0 X ]

= E(Y | D = 1 X)minus E(Y | D = 0 X) = μ(1 X)minusμ(0 X)

Thus AT TX can be expressed in terms of the distribution of observables (hereμ(1 X) and μ(0 X)) so this effect is identified (eg Hurwicz 1950) Simi-larly AT T is identified The identification result based on conditional exogeneityhas been extensively discussed in the treatment effect literature (see eg Rubin1974 Holland 1986 and Angrist and Pischke 2009)

To see the structural meaning for Gminuscausality observe that E(Y | D = 1 X)minusE(Y | D = 0 X) = 0 as is another way to write E(Y | D X) = E(Y | X) asThus in this context Gminusnon-causality in mean of D for Y wrt X is equivalentto AT TX = 0 as

Note that even for binary treatments Gminuscausality in mean does not tell thefull story as Gminuscausality in distribution can hold even when AT TX or AT Tvanishes For example suppose

Y = U D

with D perp U E(U ) = 0 E(U 2) lt infin and E(D2) lt infin Then D structurallycauses Y its effect on Y is U a random effect These effects may be posi-tive or negative however they average out as AT T = 07 Here and as we willshow generally Gminusnon-causality essentially has the interpretation that Y is not

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 275

structurally caused by D under the key conditional exogeneity assumption Inthis example we have Gminuscausality (D perp Y ) as E(Y 2 | D) = D2 E(U 2) TestingGminusnon-causality will detect the structural causality here in contrast to testingAT T = 0 or equivalently Gminusnon-causality in mean of D for Y

When D is nonbinary A1 together with suitable exogeneity conditionssimilarly suffices to identify certain effects that give structural meaning toGminuscausality This makes it generally possible to test for structural causality bytesting for Gminuscausality

With heterogeneity matters become somewhat more complicated In particularin the binary case μ(1 X)minusμ(0 X) is no longer AT TX Instead with suitableexogeneity (Assumption A2 below) we have that for each x μ(1 x)minusμ(0 x)recovers a blended effect

μ(1 x)minusμ(0 x) =sumg=1

ζg|x AT Tg|x

where the ζg|x rsquos are nonnegative weights adding to one and for all j isin JgAT Tg|x = E[Yj (1) minus Yj (0) | Dj = 1 X j = x] is the group-specific covariate-conditioned average effect of treatment on the treated Here Yj (1) is the poten-tial response to treatment and Yj (0) is the potential response in the absence oftreatment

33 Exogeneity and effect identification with heterogeneity

With heterogeneity an identifying exogeneity condition relevant for analyzingGminuscausality is

Assumption A2 (Heterogeneous Cross-Section Exogeneity) For all g isin G(i) Dg | Xg sim D | X and (i i) Dg perp Ug | Xg

Observe that this imposes structures for all groups namely all g isin G Althoughthis condition is not the weakest possible (see Chalak and White (2012) and thediscussion below) A2 generally suffices to identify effects and link Gminuscausalityand structural causality A2(i) restricts the allowed distributional heterogeneityfor all g isin G the conditional distributions of Dg given Xg are assumed identicalA2(i i) imposes conditional exogeneity for all groups Below we discuss thisfurther

THEOREM 31 Suppose A1 - A2(i) hold If A2(i i) holds then D perp U | X

It is also of interest to know whether the converse holds A strict conversedoes not hold as certain fortuitous cancellations in population-group condi-tional distributions can yield sample conditional exogeneity without population-group conditional exogeneity Nevertheless the converse does hold under a mildregularity condition ruling out exceptional cases We introduce the followingdefinition

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

276 XUN LU ET AL

DEFINITION 34 Ug | Dg Xg g isin G is regular ifsumgisinG

pg fg(u | d x) fg(x) =sumgisinG

pg fg(u | x) fg(x) for all (ud x) (32)

implies Ug perp Dg | Xg for all g isin GWhen G = 1 Ug | Dg Xg g isin G is necessarily regular To understand whatregularity rules out consider the next simplest case with G = 2 and Xg absentThen Ug | Dg g isin G fails to be regular if and only if U2 perp D2 (say) and

f1(u | d) = f1(u)minus 1minus p1

p1[ f2(u | d)minus f2(u)] for all (ud)

where the two groups have been re-indexed for convenience Suppose that p1f1(u) f2(u) and f2(u | d) are arbitrary Then it is easily arranged that f1(u | d)can be negative for (ud) in a set of positive probability so f1(u | d) does notdefine a conditional density and Ug | Dg g isin G is regular after all If this f1(u |d) is nevertheless a conditional density it is clearly highly special as it ensuresthat eq(32) holds for the given p1 and the functions defined by f1(u) f2(u) andf2(u | d) (all of which are typically unknown) but not necessarily otherwise

In the general case Ug | Dg Xg g isin G fails to be regular if and only ifUg perp Dg | Xg for some g isin G and

f1(u | d x) = f1(u | x)minus (p1 f1(x))minus1sumg=2

pg [ fg(u | d x) minus fg(u | x)] fg(x)

times for all (ud x)

where we again re-index the groups As before this is clearly a very special pop-ulation configuration ruling out such cases by imposing regularity is a weak re-striction

The converse result is

THEOREM 32 Suppose A1 - A2(i) hold and suppose Ug | Dg Xgg isin Gis regular If D perp U | X then A2(i i) holds

Theorems 31-32 also hold with Yg | Dg Xg g isinG regular D perp U | X replacedby D perp Y | X and A2(i i) replaced by Dg perp Yg | Xg g isin G This result plays acentral role in linking structural causality and Gminuscausality

34 Linking structural causality and GminuscausalityAs we have seen A1 imposes structures where causal effects are well definedA2 permits recovering versions of these As we now prove this gives structuralmeaning to Gminuscausality generally Recall that the bj rsquos are identical in group Jgthus the same structural causality relations hold for all j in a given group Wewrite Dg rArrS Yg when Dj rArrS Yj for (all) j in group Jg Our next result showsthat structural noncausality implies Gminusnon-causality

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 277

PROPOSITION 33 Let A1 - A2 hold Then structural noncausality(Dg rArrS Yg forallg isinG) implies that D does not Gminuscause Y wrt X ie D perp Y | X

This result is intuitive when structural effects are well defined and identifiableGminusnon-causality follows from structural noncausality Holland (1986 p 958)gives a similar result for the random experiment case Our Proposition 33 jus-tifies using tests of Gminusnoncausality to test structural noncausality if we rejectGminusnoncausality we must reject structural noncausality

As WL show structural causality does not imply Gminuscausality Without furtherassumptions the concepts are not equivalent Nevertheless if as is commonlyassumed in the literature (see WL for discussion) r obeys separability betweenobservable causes D and unobservable causes U or r obeys a specific form ofmonotonicity in U then with suitable regularity structural causality does implyGminuscausality We have

THEOREM 34 Given A1 - A2 suppose Yg | Dg Xgg isin G is regularSuppose further that for all g isin G either (i) or (i i) holds

(i) For all j isin Jg for unknown measurable functions r1 and r2

Yj = r1(Dj Zj bj

)+ r2(Zj Uj bj

) (33)

(ii) For all j isinJg for = 1 ky r(d zu bj ) = r0(d zu bj ) for scalaru where r0(d z middot bj ) is strictly monotone increasing for each admissi-ble (d z) and Fj(y | d x) = P[Yj le y | Dj = d X j = x] is strictlymonotone in y for each admissible (d x)

Then structural causality (for some g isin G Dj rArrS Yj for (all) j in Jg) impliesthat D Gminuscauses Y wrt X ie D perp Y | X

Kasy (2011) gives a discussion of structures satisfying (i i) Even without sep-arability or monotonicity there is an equivalence between Gminuscausality and astronger notion of structural causality that handles certain exceptional cases wherethe causal structure and the conditional distribution of (Uj Dj ) given X j inter-act in just the right way to hide the structural causality WL call this strongernotion structural causality with positive probability and provide discussions Welet Yj = supp(Yj ) and Xj = supp(X j )

DEFINITION 35 (Structural Causality with Positive Probability) SupposeA1 holds and let j be given Suppose that for each y isin Yj there exists a mea-surable function f jy Xj rarr [01] such thatint

1r(Dj Zj u bj

)lt y

d Fj

(u | X j

) = f jy(X j

)as (34)

where Fj (u | X j ) = E[1Uj le u | X j ] Then Dj does not structurally cause Yj

almost surely (as) wrt X j (Dj rArrS(X j ) Yj ) Otherwise Dj structurally causesYj with positive probability (wpp) wrt X j

(Dj rArrS(X j ) Yj

)

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

278 XUN LU ET AL

By the definition of a group the same structural causality wpp relations holdfor all j in a given group We thus write Dg rArrS(Xg) Yg when Dj rArrS(X j ) Yj holdsfor all j in group Jg

Below is an example in WL in which Dj structurally causes Yj while Dj doesnot structurally cause Yj as Consider the structural equation r

Yj = r(Dj Uj

) = DjradicD2

j +1Uj1 + 1radic

D2j +1

Uj2

where Uj equiv (Uj1Uj2) and Dj are all N (01) random variables and Dj U1 j andU2 j are mutually independent For simplicity there is no bj Zj or X j It is clearthat Dj structurally causes Yj here Nevertheless Dj does not structurally causeYj as To see this note that Yj and Dj are independent Thus the LHS of eq(34)becomesint

1r(Dj u1u2

)lt y

d Fj (u1u2)

=int

1r(Dj u1u2

)lt y

d Fj

(u1u2|Dj

) = Pr[Yj lt y|Dj

] = (y)

where Fj (middot) and Fj(middot|Dj

)denote the CDF of Uj and the conditional CDF of Uj

given Dj and is the standard normal CDF Thus the RHS does not depend onDj ie Dj does not structurally cause Yj as

Theorem 35 below gives a result linking Gminuscausality and structural causalityas

THEOREM 35 Suppose A1 - A2 hold Suppose also that Yg | Dg Xgg isinG is regular If Dj structurally causes Yj wpp wrt X j (for some g isin GDj rArrS(X j ) Yj for all j in Jg) then D Gminuscauses Y wrt X ie D perp Y | X

Together Proposition 33 and Theorem 35 establish that cross-sectionGminuscausality and structural causality are essentially equivalent under the key con-ditional exogeneity assumption We say ldquoessentiallyrdquo as Theorem 35 rules outthe two ways that Gminusnon-causality can mask structural causality The first arisesfrom subtle interactions between the causal structure and the conditional distribu-tion of (Uj Dj ) given X j The second arises under heterogeneity from delicatecancellations among the conditional distributions of Yj given Dj and X j acrossgroups These exceptional possibilities only trivially mitigate the power of testsfor Gminuscausality as tests for structural causality Errors of interpretation and in-ference need not result as long as these exceptions are recognized

35 Identification and identification failure

To gain further insight into the relation between Gminuscausality and effect identi-fication we now undertake a deeper analysis of Gminuscausality in mean For this

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 279

let qg(d x) = fg(d x)sumhisinG (ph fh(d x)) Under A1 the law of iterated ex-

pectations gives

E (Y | D = d X = x) =sumgisinG

pg qg(d x)

intr(d zu bg) d Fg(u | d x)

For concreteness consider identifying average marginal effects for a continu-ous treatment d To allow both discrete and continuous Ug let νg(u | d x) definea σminusfinite measure dominating that defined by Fg(u | d x) and let fg(u | d x)define the associated Radon-Nikodym density for Ug given (Dg Xg) Assum-ing differentiability and mild regularity derivations parallel to those of White andChalak (2013) give

part

partdE(Y | D = d X = x)

=sumgisinG

pg qg(d x)

intpart

partdr(d zu bg) fg(u | d x) dνg(u | d x)

+sumgisinG

pg qg(d x)

int[r(d zu bg)

part

partdln fg(u | d x)]

times fg(u | d x) dνg(u | d x)

+sumgisinG

pgpart

partdqg(d x)

intr(d zu bg) fg(u | d x)dνg(u | d x)

After some manipulation this regression derivative can be written as

part

partdE (Y | D = d X = x) = β(d x)+ δ1(d x)+ δ2(d x) (35)

The first of the three terms on the right is a weighted average marginal effect

β(d x) =sumgisinG

pg qg(d x)βg(d x) where βg(d x)

= E[(partpartd)r(Dg ZgUg bg) | Dg = d Xg = x

]is the covariate-conditioned average marginal effect of Dj on Yj at (Dj = d X j =x) for j isin Jg g isin G Thus when the other two terms vanish the regressionderivative identifies the structurally meaningful weighted effect β(d x) andwhen β(d x) is nonzero for (d x) in a set of positive probability we haveGminuscausality in mean of D for Y wrt X When the other two terms do notvanish we generally have identification failure and there is no necessary linkbetween Gminuscausality and structural causality

The second term is a pure endogeneity bias

δ1(d x) =sumgisinG

pg qg(d x) E(εg Sg | Dg = d Xg = x

)

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

280 XUN LU ET AL

where εg = Ygminus E(Yg | Dg = d Xg = x

)is the regression residual for the given

group and Sg = (partpartd) ln fg(Ug | Dg Xg) is the exogeneity score of White andChalak (2013) Observe that E(εg Sg | Dg = d Xg = x) represents a specificform of omitted variable bias where the exogeneity score is the omitted variableA sufficient (but not necessary) condition for this bias to vanish is A2(i i) Dg perpUg | Xg for all g isin G

The third term is a pure heterogeneity bias

δ2(d x) =sumgisinG

pgpart

partdqg(d x) E

(Yg | Dg = d Xg = x

)

due to heterogeneity of fg(d | x) across members of G8 A sufficient (but notnecessary) condition for this to vanish is A2(i) Dg | Xg sim D | X for all g isin G

Thus A2 ensures that the regression derivative identifies the weighted effect

β(d x) = βlowast(d x) =sumgisinG

pg qlowastg(x)βlowast

g(d x)

where qlowastg(x) = fg(x)

sumh (ph fh(x)) and βlowast

g(d x) = int(partpartd)r(d zu bg)

d Fg(u | x) This identification provides the link between Gminuscausality in meanand structural causality

Although A2 is not necessary for effect identification cases where identifica-tion holds in the absence of A2 are quite special analogous to failures of reg-ularity Thus for practical purposes A2 can be viewed as playing the key rolein identifying effects of interest and thereby linking Gminuscausality and structuralcausality

Further as Proposition 61 and Corollary 62 of WL show in the absence ofstructural causality Gminuscausality is essentially equivalent to exogeneity failureHere this is reflected in the fact that when βg(d x) vanishes for all g A1 andmild regularity conditions give

part

partdE (Y | D = d X = x) = δ1(d x)+ δ2(d x)

Thus with structural noncausality as Gminuscausality in mean implies the fail-ure of A2(i) A2(i i) or both conversely failure of A2 essentially ensuresGminuscausality in mean

36 Testing for Gminuscausality and structural causality in cross-sectiondata

As just seen the hypothesis of Gminusnon-causality and thus of structural noncausal-ity under the conditional exogeneity assumption is a specific conditional inde-pendence In the literature there are many conditional independence tests thatapply to IID data (eg see Delgado and Gonzalez-Manteiga 2001 Fernandes and

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 281

Flores 2001 Su and White 2007 2008 2014 Song 2009 Linton and Gozalo2014 Huang Sun and White 2016) These methods can be computationallychallenging as they are nonparametric Angrist and Kuersteiner (2004 2011)develop a semi-parametric test for conditional independence Based on the factthat Y perp D | X implies ψ1(Y ) perp ψ2(D) | X for any measurable vector functions(ψ1ψ2) WL propose several convenient regression-based tests

4 STRUCTURAL CAUSALITY AND GminusCAUSALITY IN PANEL DATAThe literature contains considerable discussion about Gminuscausality in the paneldata setting Nevertheless the focus is mainly on linear or parametric modelsFor example Chamberlain (1984) discusses Gminuscausality ldquoconditional on un-observables [lsquofixed effectrsquo]rdquo in a linear model and a logit model Holtz-EakinNewey and Rosen (1988) consider a traditional linear vector autoregression(VAR) and use GMM to test Gminuscausality Nair-Reichert and Weinhold (2001)discuss Gminuscausality in a dynamic mixed fixed- and random-coefficients linearmodel allowing heterogeneity across individuals Dumitrescu and Hurlin (2012)test for Gminuscausality in a linear dynamic panel model with fixed coefficients thatvary across individuals There is also a growing literature on nonlinear and non-separable panel models (see eg Hoderlein and White (2012) and the referencestherein) We are not aware that Gminuscausality has been discussed in such modelsso far

We emphasize that in panel data Gminuscausality is also entirely based on pre-dictability and has no causal interpretation In this section we link Gminuscausalityand structural causality in general panel structures Since the static panel case issimilar to the cross-sectional case we focus on a dynamic data generating processTo simplify notation for given y isin N+ we let Y jtminus1 denote the finite historyY jtminus1 = (Yjtminusy Yjtminus1)

Assumption B1 (Panel Population DGP) Let (F P) be a complete prob-ability space let N isin N+ and let y isin N+ For the members of a populationj isin 1 N let the random vectors Yjt t = 12 be structurally determinedby a triangular system as

Yjt = r(Y jtminus1 Djt Zjt Ujt bjt

) (41)

where r is an unknown measurable ky times 1 function ky isin N+ Yjτ (τ = 1 minusy 0) Djt (kd times1kd isin N+) Zjt (kz times1 kz isin N) and Ujt (ku times1 ku isin N)are vectors of nondegenerate random variables on (F P) and bjt is a non-random real vector of dimension kb isin N Suppose also that Wjt t = 12 arerandom vectors on (F P) with dimension kw isin N The triangular structureis such that for t = 12 neither Yjsinfins=1minusy

nor Djsinfins=1 structurally deter-mine Wjt Yjsinfins=1minusy

does not structurally determine Djt Zjt or Ujt andDjsinfins=1 does not structurally determine Zjt or Ujt

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

282 XUN LU ET AL

The interpretation of structural eq(41) is the same as that for eq(31) ex-cept that here we have multiple time periods t = 12 The elements of(Yjt Djt Zjt Wjt Ujt

)and bjt rsquos can contain both time-varying and time-

invariant elements For now we do not distinguish these The elements of(Djt Zjt Wjt Ujt ) can contain lags of underlying variables By permittingonly finite histories Y jtminus1 Djt Zjt we restrict attention to Markov-type datagenerating processes this greatly simplifies the analysis and corresponds to thestructures usually considered in practice

As in the cross-section case we can have both structural heterogeneity anddistributional heterogeneity We specify the relevant structural and distributionalheterogeneities below

We again impose random sampling from the population For simplicity weassume all time periods can be observed for every individual so we observe abalanced panel Similar to the cross-section case we write

Yt = r (Y tminus1 Dt Zt Ut Bt ) t = 1 T

by suppressing the cross-sectional subscript i We assume that we observe data(Yt Dt Zt Wt ) t = 1 T and relevant lags prior to t = 1

41 Linking structural causality and Gminuscausality in dynamic panelsWe first define structural causality for the dynamic panel structure Let Djt be thesupport of Djt

DEFINITION 41 (Structural Causality in Dynamic Panels) Let j and t begiven If the function r(y jtminus1 middot zjt ujt bjt ) Djt rarr R

ky is constant on Djt

for all admissible y jtminus1 zjt and ujt then Djt does not structurally cause Yjt and we write Djt rArrS Yjt Otherwise Djt structurally causes Yjt and we writeDjt rArrS Yjt

We let X jt = (Z primejt W prime

jt )prime denote the covariates at time t WL consider

retrospective conditional exogeneity and retrospective Gminuscausality (White andKennedy 2009) We also allow this For this we letX jt denote a covariate his-tory that may contain X jt and lags or leads of X jt as in Wooldridge (2005 p 41)or WL ie for some m isin N+

X jt = SX(X jtminusm X jt X jt+m

) t = 12

where SX is a given selection matrix We assume

X jt X jtminusm subeX jt and

allow X jt also to contain leads of X jt When Xjt contains leads we have theretrospective case as in WL We denote randomly sampled covariates X t Forsimplicity we also assume that all needed elements of X t are observable Tem-poral precedence plays an important role in panel data In particular comparedwith the cross-sectional case we now have a richer set of covariates (includingboth leads and lags) to choose from

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 283

To provide a link between structural causality and Gminuscausality for dynamicpanels we condition on not only covariates X t but also lags of Yt For thiswe denote Y jtminus1 = (Yjtminusmy Yjtminus1) where my isin N+ and my ge y ie

Y jtminus1 sube Y jtminus1 Similarly we let Y tminus1 be the randomly sampled Y jtminus1Similar to the cross-section case for each t we define population group Jgt

g isin 1 t as a collection of population units having identical bjt and iden-tical distribution of (Ujt Djt ) |X jt Y jtminus1 We also let Gt = 1 t Forgiven t we write Dgt rArrS Ygt to denote Djt rArrS Yjt for all j in group Jgt g isin Gt

Next we impose an exogeneity assumption analogous to A2

Assumption B2 (Heterogeneous Dynamic Panel Exogeneity) Let t be givenFor all g isin Gt (i) Dgt |Xgt Y gtminus1 sim Dt |X t Y tminus1 and (i i) Dgt perp Ugt |Xgt Y gtminus1

Here as before for all g isin Gt the conditional distributions of Dgt given(Xgt Y gtminus1) are assumed to be identical and we write the common distributionas Dt |X t Y tminus1

The analog of Theorem 31 relating population and sample conditional exo-geneity is

THEOREM 41 Suppose B1ndashB2(i) hold and let t be given If B2(i i) holdsthen Dt perp Ut |X t Y tminus1

Proposition 42 provides a link between structural noncausality and Gminusnon-causality

PROPOSITION 42 Suppose that B1 and B2 hold Let t be given Thenstructural noncausality (Dgt rArrS Ygt forallg isin Gt ) implies that Dt does notGminuscause Yt wrtX t Y tminus1 ie Dt perp Yt |X t Y tminus1

We can also define structural causality with positive probability for dynamicstructures and show that structural causality and Gminuscausality are ldquoessentiallyrdquoequivalent under the key conditional exogeneity assumption For brevity we omitthe details

So far we have not carefully distinguished between time-varying and time-invariant elements of unobservable bjt and Ujt Below we briefly discuss thecase where we allow time-invariant components of Ujt and the randomly sampledbjt to be arbitrarily dependent or correlated with the cause of interest Djt For this we denote Ujt equiv (Ujt Uj0) where Ujt is time varying and Uj0time-invariant Similarly we denote bjt equiv (bjt bj0) The randomly sampledUjt Uj0 bjt bj0 are denoted as Ut U0 Bt B0 respectively In princi-ple we want to remove the time-invariant Uj0 and bj0 using eg the firstdifference For this it is convenient to impose a separable assumption on thestructural equation9 We also impose a conditional exogeneity assumption at thesample level

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

284 XUN LU ET AL

Assumption B3 Suppose that B1 holds with

Yjt = r1(Djt Uj0 bj0

)+r2(Y jtminus1 Djt Zjt Ujt bjt

) j = 1 N t = 1

where r1 and r2 are two unknown ky times1 measurable functions

Assumption B4 Suppose that for given t Dt perp (Ut Utminus1 Bt Btminus1

) |Xt Y tminus1

B4 is a sample-level assumption which can be supported by a correspond-ing assumption at the heterogeneous population level Note that in general B4is a strong assumption as we condition on Y tminus1 which is a function of bothDtminus1 and Utminus1 Nevertheless this assumption is plausible under the null of struc-tural noncausality as Y jtminus1 is a constant function of Djtminus1 under the nullOne simple example is that bjt rsquos are constants over j and t and Djt infint=1 perpYjt 0

t=1minusyUjt infint=1X jt infint=1 Under the null of structural noncausality for

all t Y jtminus1 is a function of Yjs0s=1minusy

Ujsts=1X jst

s=1 Then it is easyto show that in this case B4 is satisfied Certainly it will be interesting to relaxB4 but we leave this for future research

PROPOSITION 43 Suppose that B1ndashB4 hold Suppose that Dgt rArrS Ygtand Dgtminus1 rArrS Ygtminus1 forallg isin Gt cupGtminus1 Then Dt perp Yt | Xt Y tminus1

Proposition 43 suggests that when there are time-invariant components in theunobservables we can test for structural noncausality by testing for Gminusnon-causality Dt perp Yt | Xt Y tminus1 under the conditional exogeneity assumption B4

42 Testing for Gminuscausality and structural causality in panel dataAs shown above we can perform a Gminusnon causality test to test for structuralnoncausality Testing Gminusnon-causality in panel data is also simply a conditionalindependence test Here we focus on testing Dt perp Yt | Xt Y tminus1

First assume that the joint distributions of (Dt Yt Xt ) are identical over t In this case we can pool the data and implement a conditional independence testas discussed in Section 36

Second suppose that the joint distributions of (Dt Yt Xt ) are different overtime t In this case for each time period t using the cross-section data we canimplement a conditional independence test for Dt perp Yt | (Xt Y tminus1) and obtain atest statistic sayWt Thus we have T test statistics W1W2 WT and eachone can be used to test for structural noncausality for each t We may also want totest the hypothesis that for all t Djt does not structurally cause Yjt For this wecan construct our test statistic by taking the average of the T test statistics ieW equiv 1

T

sumTt=1Wt as in Im Pesaran and Shin (2003) and Dumitrescu and Hurlin

(2012) Other forms of ldquoaveragesrdquo are possible and special care is needed to takeinto account the dependence structure over time

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 285

5 CONCLUSION

This paper provides direct links between Granger causality and structural causal-ity in cross-section and panel data We extend Granger causality to cross-section and panel data and give a natural definition of structural causality inheterogeneous populations We show that under the key conditional exogeneityassumption Granger causality is essentially equivalent to structural causality incross-section and panel data Similar to the results in White and Lu (2010) fortime-series data our results here should enable researchers to avoid the misuse ofGranger causality and to establish the desired structural causal relation

NOTES

1 Strictly speaking if the conditional distributions are parameterized or treated as infinite-dimensional parameters then distributional heterogeneity may also be thought of as a special caseof structural heterogeneity

2 The classification is certainly overly simplistic We ignore the early literature on the philosophyof causality (for a review see eg Holland 1986 and Hoover 2008) We also ignore the litera-ture on causality discussed in other disciplines notably in machine learning (see eg Pearl 20092015 White and Chalak 2009 and White Chalak and Lu 2011) In particular Heckman and Pinto(2015) compare Haavelmorsquos structural framework of causality with the framework of Directed AcyclicGraphs (DAG) studied in the Bayesian network

3 Although Uj is typically called ldquounobservablerdquo it is better to view its elements as variables thatwill be omitted from empirical analysis This may be because they are unobservable it may also bebecause the researcher has purposefully or inadvertantly neglected them

4 For notational simplicity we suppress the observation subscript i5 Using our notation Holland considers the case where D perp (XU ) which implies our condi-

tional exogeneity assumption D perp U | X below6 See eg Altonji and Matzkin (2005) Hoderlein and Mammen (2007 2009) Imbens and Newey

(2009) White and Lu (2011) and White and Chalak (2013)7 AT T = E(Y (1)minusY (0) | D = 1) = E(U minus0 | D = 1) = E(U ) = 0

8 The conditional density fg(d | x) can also be interpreted as Imbensrsquos (2000) generalized propen-sity score

9 There is a recent literature which allows time-invariant unobservables to be arbitrarily correlatedwith causes of interest in the general nonseparable models (see eg Evdokimov 2010) We leave theproblem of linking G-causality and structural causality in this general setting for future research

REFERENCES

Altonji J amp R Matzkin (2005) Cross section and panel data estimators for nonseparable models withendogenous regressors Econometrica 73 1053ndash1102

Angrist J K Graddy amp G Imbens (2000) The interpretation of instrumental variables estimatorsin simultaneous equations models with an application to the demand for fish Review of EconomicStudies 67 499ndash527

Angrist J O Jorda amp G Kuersteiner (2013) Semiparametric estimates of monetary policy effectsstring theory revisited NBER Working Paper 19355

Angrist J amp G Kuersteiner (2004) Semiparametric causality tests using the policy propensity scoreNBER Working Paper No 10975

Angrist J amp G Kuersteiner (2011) Causal effects of monetary shocks semiparametric conditionalindependence tests with a multinomial propensity score Review of Economics and Statistics 93725ndash747

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

286 XUN LU ET AL

Angrist J amp J Pischke (2009) Mostly Harmless Econometrics An Empiricistrsquos Companion PrincetonUniversity Press

Bester CA amp CB Hansen (2016) Grouped effects estimators in fixed effects models Journal ofEconometrics 190 197ndash208

Bonhomme S amp E Manresa (2015) Grouped patterns of heterogeneity in panel data Econometrica83 1147ndash1184

Browning M amp JM Carro (2007) Heterogeneity and microeconometrics modelling In R BlundellW K Newey and T Persson (Eds) Advances in Economics and Econometrics Theory andApplications Ninth World Congress of the Econometric Society Volume 3 pp 45ndash74 CambridgeUniversity Press

Browning M amp JM Carro (2010) Heterogeneity in dynamic discrete choice models EconometricsJournal 13 1ndash39

Burda M M Harding amp J Hausman (2015) A Bayesian semi-parametric competing risk model withunobserved heterogeneity Journal of Applied Econometrics 30 353ndash376

Chalak K amp H White (2011) An extended class of instrumental variables for the estimation of causaleffects Canadian Journal of Economics 44 1ndash51

Chalak K amp H White (2012) Causality conditional independence and graphical separation insettable systems Neural Computation 24 1611ndash1668

Chamberlain G (1982) The general equivalence of Granger and Sims causality Econometrica 50569ndash581

Chamberlain G (1984) Panel data In Z Griliches amp M Intriligator (eds) Handbook of Economet-rics vol2 ch 22 pp1247ndash1318 Elsevier

Chesher A (2003) Identification in nonseparable models Econometrica 71 1405ndash1441Chesher A (2005) Nonparametric identification under discrete variation Econometrica 73

1525ndash1550Dawid AP (1979) Conditional independence in statistical theory Journal of the Royal Statistical

Society Series B 41 1ndash31Deb P amp PK Trivedi (2013) Finite mixture for panels with fixed effects Journal of Econometric

Methods 2 35ndash51Delgado M amp W Gonzalez-Manteiga (2001) Significance testing in nonparametric regression based

on the bootstrap Annals of Statistics 29 1469ndash1507Dumitrescu E-I amp C Hurlin (2012) Testing for Granger non-causality in heterogeneous panels

Economic Modelling 29 1450ndash1460Evdokimov K (2010) Identification and estimation of a nonparametric panel data model with unob-

served heterogeneity Working paper Department of Economics Princeton UniversityFlorens JP (2003) Some technical issues in defining causality Journal of Econometrics 112

127ndash128Florens JP amp D Fougere (1996) Non-causality in continuous time Econometrica 64 1195ndash1212Florens JP amp M Mouchart (1982) A note on non-causality Econometrica 50 583ndash591Granger CWJ (1969) Investigating causal relations by econometric models and cross-spectral

methods Econometrica 37 424ndash438Granger CWJ (1980) Testing for causality a personal viewpoint Journal of Economic Dynamics

and Control 2 329ndash352Granger CWJ (1988) Some recent developments in a concept of causality Journal of Econometrics

39 199ndash211Granger CWJ amp P Newbold (1986) Forecasting Economic Time Series Academic Press 2nd

editionHausman J amp T Woutersen (2014) Estimating a semi-parametric duration model without specifying

heterogeneity Journal of Econometrics 178 114ndash131Haavelmo T (1943) The statistical implications of a system of simultaneous equations Econometrica

11 1ndash12Haavelmo T (1944) The probability approach in econometrics Econometrica 12 (supp) iii-vi

1ndash115

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 287

Hahn J (1998) On the role of the propensity score in efficient semiparametric estimation of averagetreatment effect Econometrica 66 315ndash331

Hahn J amp R Moon (2010) Panel data models with finite number of multiple equilibria EconometricTheory 26 863ndash881

Heckman JJ (2000) Causal parameters and policy analysis in economics a twentieth century retro-spective Quarterly Journal of Economics 115 45ndash97

Heckman JJ (2008) Econometric causality International Statistical Review 76 1ndash27Heckman JJ amp R Pinto (2015) Causal analysis after Haavelmo Econometric Theory 31

115ndash151Hirano K G Imbens amp G Ridder (2003) Efficient estimation of average treatment effects using the

estimated propensity score Econometrica 71 1161ndash1189Hoderlein S amp E Mammen (2007) Identification of marginal effects in nonseparable models without

monotonicity Econometrica 75 1513ndash1519Hoderlein S amp E Mammen (2009) Identification and estimation of local average derivatives in non-

separable models without monotonicity Econometrics Journal 12 1ndash25Hoderlein S amp H White (2012) Nonparametric identification in nonseparable panel data models with

generalized fixed effects Journal of Econometrics 168 300ndash314Holland P (1986) Statistics and causal inference Journal of the American Statistical Association 81

945ndash970Holtz-Eakin D W Newey amp HS Rosen (1988) Estimating vector autoregressions with panel data

Econometrica 56 1371ndash1396Hoover K (2008) Causality in economics and econometrics In S Durlauf and L Blume (eds)

The New Palgrave Dictionary of Economics 2nd edition Palgrave MacmillanHsiao C (2014) Analysis of Panel Data Cambridge University PressHuang M Y Sun amp H White (2016) A flexible nonparametric test for conditional independence

Econometric Theory forthcomingHurwicz L (1950) Generalization of the concept of identification In TC Koopmans (eds) Statistical

Inference in Dynamic Economic Models pp 238ndash257 John WileyIm K H Pesaran amp Y Shin (2003) Testing for unit roots in heterogeneous panels Journal of Econo-

metrics 115 53ndash74Imbens G (2000) The role of the propensity score in estimating dose-response functions Biometrika

87 706ndash710Imbens G (2004) Nonparametric estimation of average treatment effects under exogeneity a review

Review of Economics and Statistics 86 4ndash29Imbens G amp W Newey (2009) Identification and estimation of triangular simultaneous equations

models without additivity Econometrica 77 1481ndash1512Imbens G amp J Wooldridge (2009) Recent developments in the econometrics of program evaluation

Journal of Economic Literature 47 5ndash86Kasy M (2011) Identification in triangular systems using control functions Econometric Theory 27

663ndash671Koopmans T (1950) Statistical Inference in Dynamic Economic Models Cowles Commission Mono-

graph No 10 WileyKuersteiner G (2008) GrangerndashSims causality In S Durlauf and L Blume (Eds) The New Palgrave

Dictionary of Economics 2nd edition Palgrave MacmillanLechner M (2001) Identification and estimation of causal effect of multiple treatments under con-

ditional independence assumption In M Lechner F Pfeiller (eds) Econometric Evaluations ofLabour Market Policies Hildelberg Physical 43ndash58

Lechner M (2011) The relation of different concepts of causality used in time series and microecono-metrics Econometric Reviews 30 109ndash127

Lin C-C amp S Ng (2012) Estimation of panel data models with parameter heterogeneity when groupmembership is unknown Journal of Econometric Methods 1 42ndash55

Lindsay B (1995) Mixture Models Theory Geometry and Applications Hayward Institute for Math-ematical Statistics

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

288 XUN LU ET AL

Linton O amp P Gozalo (2014) Testing conditional independence restrictions Econometric Reviews33 523ndash552

Lu X amp L Su (2014) Determining the number of groups in latent panel structures Working paperSchool of Economics Singapore Management University

Matzkin R (2003) Nonparametric estimation of nonadditive random functions Econometrica 711339ndash1375

Matzkin R (2007) Nonparametric identification In JJ Heckman and EE Leamer (eds) The Hand-book of Econometrics 6B North-Holland

Mclachlan G amp K Basford (1988) Mixture Models Inference and Applications to Clustering MarcelDekker

Nair-Reichert U amp D Weinhold (2001) Causality tests for cross-country panels a new look at FDIand economic growth in developing countries Oxford Bulletin of Economics and Statistics 63153-171

Pearl J (2009) Causality Models Reasoning and Inference (2nd edition) Cambridge UniversityPress

Pearl J (2015) Trygve Haavelmo and the emergence of causal calculus Econometric Theory 31152ndash179

Rosenbaum P amp D Rubin (1983) The central role of the propensity score in observational studies forcausal effects Biometrika 70 41ndash55

Rubin D (1974) Estimating causal effects of treatments in randomized and non-randomized studiesJournal of Educational Psychology 66 688ndash701

Rubin D (2004) Direct and indirect causal effects via potential outcomes Scandinavian Journal ofStatistics 31 161ndash170

Sarafidis V amp N Weber (2015) A partially heterogeneous framework for analyzing panel data OxfordBulletin of Economics and Statistics 77 274ndash296

Sims C (1972) Money income and causality American Economic Review 62 540ndash552Song K (2009) Testing conditional independence via Rosenblatt transforms Annals of Statistics 37

4011ndash4015Strotz R amp H Wold (1960) Recursive vs nonrecursive systems An attempt at synthesis Economet-

rica 28 417ndash427Su L Z Shi amp PCB Phillips (2014) Identifying latent structures in panel data Working paper

Department of Economics Yale UniversitySu L amp H White (2007) A Consistent characteristic function-based test for conditional indepen-

dence Journal of Econometrics 141 807ndash834Su L amp H White (2008) A nonparametric Hellinger metric test for conditional independence Econo-

metric Theory 24 829ndash864Su L amp H White (2014) Testing conditional independence via empirical likelihood Journal of

Econometrics 182 27ndash44Sun Y (2005) Estimation and inference in panel structure models Working paper Department of

Economics UCSDWhite H amp K Chalak (2009) Settable systems An extension of Pearlrsquos causal model with optimiza-

tion equilibrium and learning Journal of Machine Learning Research 10 1759ndash1799White H amp K Chalak (2010) Testing a conditional form of exogeneity Economics Letters 109

88ndash90White H amp K Chalak (2013) Identification and identification failure for treatment effects using

structural systems Econometric Reviews 32 273ndash317White H K Chalak amp X Lu (2011) Linking Granger causality and the Pearl causal model with

settable systems Journal of Machine Learning Research Workshop and Conference Proceedings12 1ndash29

White H amp P Kennedy (2009) Retrospective estimation of causal effects through time In J Castleamp N Shephard (eds) The Methodology and Practice of Econometrics A Festschrift in Honour ofDavid F Hendry pp 59ndash87 Oxford University Press

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 289

White H amp X Lu (2010) Granger causality and dynamic structural systems Journal of FinancialEconometrics 8 193ndash243

White H amp X Lu (2011) Causal diagrams for treatment effect estimation with application to selectionof efficient covariates Review of Economics and Statistics 93 1453ndash1459

Wooldridge J (2005) Simple solutions to the initial conditions problem for dynamic nonlinear paneldata models with unobserved heterogeneity Journal of Applied Econometrics 20 39ndash54

Zellner A (1979) Causality and econometrics In K Brunner and AH Meltzer (eds) Three Aspectsof Policy and Policymaking Carnegie-Rochester Conference Series Vol 10 North-Holland 9ndash54

APPENDIX

Proof of Theorem 31 To show D perp U | X we establish that f (u | d x) = f (u | x)for all (ud x) We have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

= f (u x)

f (x)= f (u | x)

where the first equality in the second line holds since A2(i) (ie Dg | Xg sim D | X)implies fg(d x) = f (d | x) fg(x) say and the second holds by A2(i i) fg(u | d x) =fg(u | x) n

Proof of Theorem 32 Note that

f (u | x) = f (u x)

f (x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

We also have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | d x) fg(x)sum

gisinG pg fg(x)

The first equality in the second line follows from A2(i) Then f (u | d x) = f (u | x) forall (ud x) if and only ifsumgisinG

pg fg(u | d x) fg(x) =sumgisinG

pg fg(u | x) fg(x) for all (ud x)

By assumption (Ug | Dg Xg) g isinG is regular It follows immediately that for all g isinGDg perp Ug | Xg n

Proof of Proposition 33 Take any g isin G and let j belong to group Jg Dg rArrSYg means that Yj = r(Zj Uj bj ) Thus Uj perp Dj | X j implies that Yj perp Dj | X j byDawid (1979 Lemmas 41 and 42) This holds for all g isin G Applying Theorem 31gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

290 XUN LU ET AL

Proof of Theorem 34 We prove this by showing that Y perp D | X implies that Dg rArrSYg for all g isinG Given the assumed regularity Theorem 32 ensures that Y perp D | X impliesthat Yg perp Dg | Xg for all g isin G

Given (i) eq(33) and A2(ii) imply that for all j in group JgE

(Yj |X j = x Dj = d

) = r1(d z bj

)+ E[r2

(zUj bj

) | Dj = d X j = x]

= r1(d z bj

)+ E[r2

(zUj bj

) | X j = x]

Clearly r1(d z bj

)is constant in d if and only if E

(Yj |Dj = d X j = x

)is constant in

d Yj perp Dj | X j implies that E(Yj |Dj = d X j = x

)is constant in d which then implies

that r1(d z bj

)is constant in d Thus Dj rArrS Yj so Dg rArrS Yg

Given (i i) let ky = 1 without loss of generality Then by A2(ii) for all j in group JgFUj (u | x) = E

[1Uj le u

∣∣ X j = x]

= E[

1r(d zUj bj

) le r(d zu bj

)∣∣ X j = x]

= E[

1r(Dj Zj Uj bj

) le r(d zu bj

)∣∣ Dj = d X j = x]

= E[

1Yj le r

(d zu bj

)∣∣ Dj = d X j = x]

= Fj(r(d zu bj

) | d x)

By strict monotonicity of Fj (y | d x) in y

r(d zu bj

) = Fminus1j

(FUj (u | x) | d x

)

Now Yj perp Dj | X j implies that Fminus1j

(FUj (u | x) | d x

)is constant in d so r

(d zu bj

)is

also a constant function in d ie Dj rArrS Yj so Dg rArrS Yg As either (i) or (i i) holds for each g isin G we have Dg rArrS Yg for all g isin G n

Proof of Theorem 35 To show that A1 - A2 and the assumed regularity for Ygtogether with Dg rArrS(Xg) Yg for some g isin G imply D perp Y | X we let j in group Jgg isin G be such that Dj rArrS(X j ) Yj A2(i i) ensures that Dj perp Uj | X j so

P[Yj le y | Dj = d X j = x] =int

1r(d zu bj ) le y

d Fj (u | d x)

=int

1r(d zu bj ) le y

d Fj (u | x)

By assumption Dj rArrS(X j ) Yj so there exists ylowast isin supp(Yj ) such that there is no mea-

surable mapping f jylowast for whichint

1r(d zu bj ) le ylowastd Fj (u | d x) = f jylowast(x) ae-x Specifically this rules out the possibility thatint

1r(d zu bj ) le ylowast

d Fj (u | X j ) = f jylowast(X j ) equiv P[Yj le ylowast | X j ] as

Thus there exists ylowast isin supp(Yj ) such that

P[Yj le ylowast | Dj = d X j = x

] = P[Yj le ylowast | X j = x

]

for x in a set of positive probability so Yj perp Dj | X j Given the regularity assumed for YgTheorem 32 (for Yg) gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 291

Proof of Theorem 41 The proof is analogous to that of Theorem 31 n

Proof of Proposition 42 The proof is analogous to that of Proposition 33 n

Proof of Proposition 43 Under B3 Dgt rArrS Ygt and Dgtminus1 rArrS Ygtminus1 forallg isinGt cupGtminus1 imply that for j isin Jg there exist two measurable functions r1 and r2 such that

Yjt = r1(Uj0 bj0)+ r2(Y jtminus1 Zjt Ujt bjt

) j = 1 N t = 1

Hence

Yjt minusYjtminus1 = r2(Y jtminus1 Zjt Ujt bjt

)minus r2(Y jtminus2 Zjtminus1Ujtminus1 bjtminus1

)

At the sample level this means that

Yt minusYtminus1 = r2(Y tminus1 Zt Ut Bt

)minus r2(Y tminus2 Ztminus1Utminus1 Btminus1

)

Then B4 implies that

Dt perp (Xt Y tminus1Ut Utminus1 Bt Btminus1) | Xt Y tminus1

which further implies that Dt perp (Yt minusYtminus1) | Xt Y tminus1by Dawid (1979 Lemmas 41 and42) as (Yt minus Ytminus1) is a function of (Xt Y tminus1Ut Utminus1 Bt Btminus1) Then we have Dt perpYt | Xt Y tminus1 as Ytminus1 is a component of Y tminus1 n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

Page 4: 33 GRANGER CAUSALITY AND STRUCTURAL CAUSALITY IN … Lu_Su... · 2017. 2. 7. · 264 XUNLUETAL. natural experiments. The goal of this paper is to establish the analogous equiv-alence

266 XUN LU ET AL

21 Gminuscausality and Sims causalityGminuscausality and Sims causality were originally proposed to study time seriesdata Granger (1969) defines Gminuscausality ldquoin meanrdquo based on conditional expec-tations while Granger (1980) and Granger and Newbold (1986) generalize it toGminuscausality ldquoin distributionrdquo based on conditional distributions In this paper wefocus on Gminuscausality ldquoin distributionrdquo which we simply refer to as GminuscausalityTo define it we first introduce some notation For any sequence of random vectorsYt t = 01 we let Y t equiv (Y0 Yt ) denote its ldquotminushistoryrdquo and let σ(Y t )denote the sigma-field generated by Y t Let Dt Yt Xt be a sequence of ran-dom vectors where Dt is the cause of interest Yt is the response of interestand Xt is some variates Granger and Newbold (1986) say that Dtminus1 does notGminuscause Yt+k with respect to σ(Y tminus1 Xtminus1) if for all t = 12

Ft+k

(middot ∣∣ Dtminus1Y tminus1 Xtminus1

)= Ft+k

(middot ∣∣ Y tminus1 Xtminus1

) k = 01 (21)

where Ft+k( middot | Dtminus1Y tminus1 Xtminus1

)denotes the conditional distribution function

of Yt+k given(Dtminus1 Y tminus1 Xtminus1

) and Ft+k

( middot | Y tminus1 Xtminus1)

denotes that of Yt+k

given(Y tminus1 Xtminus1

) As Florens and Mouchart (1982) and Florens and Fougere

(1996) have noted eq(21) is equivalent to the conditional independence relation

Yt+k perp Dtminus1 |Y tminus1 Xtminus1 (22)

where we use X perp Y | Z to denote that X and Y are independent given Z If

E(

Yt+k∣∣Dtminus1Y tminus1 Xtminus1

)= E

(Yt+k

∣∣Y tminus1 Xtminus1)

as (23)

we say that Dtminus1 does not Gminuscause Yt+k with respect to σ(Y tminus1 Xtminus1

)ldquoin meanrdquo Eq(23) is a conditional mean independence statement In the litera-ture most of the discussion on Gminuscausality focuses on k = 0 (see eg Granger1980 1988 and Kuersteiner 2008) The definition of Gminuscausality does notrely on any economic theory or structural assumptions Granger (1969 p 430)emphasizes that ldquoThe definition of causality used above is based entirely on thepredictability of some seriesrdquo Therefore Gminuscausality does not necessarily revealany underlying causal relation and it is entirely unfounded to draw any structuralor policy conclusions from the Gminuscausality tests

ldquoSims noncausalityrdquo was originally defined in Sims (1972) Florens (2003) andAngrist and Kuersteiner (2004 2011) give a generalized definition

Y infint perp Dtminus1 |Dtminus2Y tminus1 Xtminus1 t = 1

where Y infint = (Yt Yt+1 ) Chamberlain (1982) and Florens and Mouchart (1982)

show that under some mild regularity conditions Gminusnoncausality with k = 0 andSims noncausality are equivalent when the covariates Xt are absent When thecovariates Xt are present however Gminusnon-causality and Sims noncausality

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 267

are in general not equivalent For an excellent review on the relationship betweenGminusnoncausality and Sims noncausality see Kuersteiner (2008) Similar toGminusnoncausality Sims noncausality is completely based on predictability and hasno structural interpretation

22 Structural causality and causality in the potential outcomeframework

On the other hand causality defined on structural equations or potential outcomesis mostly discussed in cross-section data The structural equation approach canbe traced back to the work of the Cowles Commission in the 1940s (see egHaavelmo 1943 1944 and Koopmans 1950) though most of their work wasbased on linear equations More recently researchers have generalized linearequations to nonseparable and nonparametric equations (see eg Chesher 20032005 Matzkin 2003 2007 and Altonji and Matzkin 2005) In 2015 Economet-ric Theory published a special issue in memory of Trygve Haavelmo (Volume 31Issue 01 and 02) which contains many interesting discussions on recent develop-ments of causality related to Haavelmorsquos structural models

Consider a simple case where the response of interest is Yi and the cause ofinterest is Di The causal relation between Di and Yi can be characterized by anunknown structural equation r such that

Yi = r(Di Ui )

where Ui is other unobservable causes of Yi For example when Yi is the demandDi is the price and Ui represents certain demand shocks r is the demand functionthat is derived from utility maximization r is a general function and does notneed to be linear parametric or separable between observable causes Di andunobservable causes Ui Here r has a structural or causal meaning and we definethe causal effect of Di on Yi based on r Let D and U be the support of Di andUi respectively If r (du) is a constant function of d for all d isin D and all u isin Uthen we simply say that Di does not structurally cause Yi (see eg Heckman2008 and White and Chalak 2009) For a binary Di the effect of Di on Yi isr(1u) minus r(0u) when Ui = u The effect can depend on unobservable u thusunobservable heterogeneity is allowed For a continuous Di the marginal effectof Di on Yi is partr(du)partd when Di = d and Ui = u To identify the effects of Di

on Yi we often impose the assumption of conditional exogeneity

Di perp Ui | Xi

where Xi is some observable covariates This includes the special case where Di

and Ui are independent corresponding to Di being randomizedHalbert White has made substantial contribution on defining identifying and

estimating causal effects in structural equations White and Chalak (2009) ex-tend Judea Pearlrsquos causal model to a settable system which incorporates features

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

268 XUN LU ET AL

of central interest to economists and econometricians optimization equilibriumand learning Roughly speaking a settable system is ldquoa mathematical frameworkdescribing an environment in which multiple agents interact under uncertaintyrdquo(White and Chalak 2009 p 1760) In the settable system a variable of interesthas two roles ldquoresponsesrdquo and ldquosettingsrdquo When the value of the variable is deter-mined by the structural equation these values are called ldquoresponsesrdquo In contrastwhen the value is not determined by the structural equation but is instead set toone of its admissible values these values are called ldquosettingsrdquo They show thaton the settable system causes and effects can be rigorously defined Chalak andWhite (2012) provide definitions of direct indirect and total causality on the set-table system in terms of functional dependence and show how causal relationsand conditional independence are connected Chalak and White (2011) providean exhaustive characterization of potentially identifying conditional exogeneityrelationships in liner structural equations and introduce conditioning and condi-tional extended instrumental variables to identify causal effects White and Chalak(2013) provide a detailed discussion on the identification and identification failureof causal effects in structural equations White and Chalak (2010) discuss how totest the conditional exogeneity assumption

The treatment effect literature adopts the potential outcome framework (seeeg Rubin 1974 Rosenbaum and Rubin 1983 and Holland 1986) For a causeof interest Di and response of interest Yi we define a collection of potential out-comes Yi (d) d isin D where D is the support of Di Then we can define causalor treatment effect based on the potential outcomes Specifically for two valuesd and dlowast in D we define Yi (dlowast)minus Yi (d) as the causal effects of Di on Yi whenDi changes from d to dlowast If Yi (d) = Yi (dlowast) for all d and dlowast isin D we say thatDi has no causal effects on Yi For example when Di is binary ie D = 01Yi (0) and Yi (1) are the potential outcomes corresponding Di = 0 and Di = 1respectively The treatment or causal effect is Yi (1)minus Yi (0) Two average effectsoften discussed in the literature are

the average treatment effect (ATE) E(Yi (1)minusYi (0))

and

the average treatment effect on the treated (ATT) E(Yi (1)minusYi (0) | Di = 1)

Note that we only observe one outcome in data For example when Di is binarythe observed outcome is simply Yi = Di Yi (1) + (1 minus Di )Yi (0) To identify theeffects we often impose the assumption of unconfoundedness or selection onobservables

Yi (d) perp Di | Xi

where Xi is observable covariates Lechner (2001) Imbens (2004) Angrist andPischke (2009) and Imbens and Wooldridge (2009) provide reviews on identi-fication and estimation of various effects in the potential outcome framework

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 269

White and Chalak (2013 p 280) show that the structural equation framework isequivalent to the potential outcome framework For example the potential out-come Yi (d) is simply r(dUi ) The unconfoundedness assumption Yi (d) perp Di |Xi in the potential outcome framework is equivalent to the conditional exogeneityassumption Ui perp Di | Xi in the structural equation framework

23 Relationship between these two types of causality

We emphasize that Gminuscausality and Sims causality are entirely based on pre-dictability while the causality defined on structural equations or potential out-comes is a real causal relation In the literature there are several papers thatlink these two types of causality White and Lu (2011) provide a direct linkbetween Gminuscausality and structural causality in structural equations They showthat given the conditional exogeneity assumption structural noncausality isessentially equivalent to Gminusnoncausality Angrist and Kuersteiner (2004 2011)and Kuersteiner (2008) discuss the link between Sims noncausality and thenoncausality defined in the potential outcome framework under the assumption ofselection on observables Angrist Jorda and Kuersteiner (2013) study the mon-etary policy effects in the potential outcome framework Lechner (2011) linksGrangerSims noncausality to various average effects defined in the potentialoutcome framework However all the discussions so far have been made in thecontext of time series data

3 STRUCTURAL CAUSALITY AND GminusCAUSALITY INCROSS-SECTION DATA

31 Structural causality

We first specify a population DGP for a single time period omitting the timeindex We write N+ =12 and N = 0cupN+ We also write N+ = N+ cupinfin and N = 0cup N+

Assumption A1 (Cross-Section Heterogeneous Population DGP) Let(F P) be a complete probability space let N isin N+ and for the membersof a population j isin 1 N let the random vector Yj be structurally determinedby a triangular system as

Yj = r(Dj Zj Uj bj

) (31)

where r is an unknown measurable ky times 1 function ky isin N+ Dj Zj and Uj j = 12 are vectors of nondegenerate random variables on (F P) havingdimensions kd isin N+ kz isin N and ku isin N respectively and bj is a nonrandomreal vector of dimension kb isin N Suppose also that Wj is a random vector on(F P) with dimension kw isin N The triangular structure is such that Yj doesnot structurally determine Dj Zj and Uj and neither Yj nor Dj structurallydetermines Wj

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

270 XUN LU ET AL

To keep causal concepts clear it is important to distinguish between the popu-lation DGP defined above and the sample DGP defined later For now we leavesampling aside We may also refer to members of the population as ldquounitsrdquo orldquoindividualsrdquo and the distribution of the population as a mixture distributionMixture distributions appear in many contexts in the literature and arise natu-rally where a statistical population contains two or more subpopulations (see egLindsay 1995 and Mclachlan and Basford 1988) As will be clear in a momentthe sources of population heterogeneity are in general not observed

We interpret Yj as the response of interest It is determined by variables Dj Zj and Uj and the constants bj The variables can be binary categorical orcontinuous Immediately below we formalize a natural notion of causality for thissystem For now it is heuristically appropriate to view Dj as observable causes ofinterest (eg a treatment) Zj as other observable causes not of primary interestand Uj as unobservable causes3

The structural function r is unknown and our goal will be to learn about itfrom a sample of the population Nevertheless when Wj has positive dimensionr embodies the a priori exclusion restriction that Wj does not determine Yj Thetypical sources of this restriction as well as the identities of Dj Zj and Uj theirstatus as observable or not and the priority or precedence relations embodied inthe assumed triangularity are economic theory and specific domain knowledge

The assumed triangularity rules out explicit simultaneity for succinctness andclarity For the causality in simultaneous structural equations there are differentviews in the literature White and Chalak (2009 2013) follow Strotz and Wold(1960) and argue that simultaneous equations are not causal structural relationsbut are instead ldquomutual consistency conditions holding between distinct sets ofstructural equations ndash for example between one set of structural equations govern-ing partial equilibrium and another governing full equilibriumrdquo However otherresearchers argue that simultaneous structural equations can be given a causalinterpretation For example Angrist Graddy and Imbens (2000) provide a poten-tial outcome interpretation for general nonlinear simultaneous equation modelsThey also show that the standard linear instrumental variable estimator identifiesa weighted average of the causal effects

By definition the constants bj are fixed for a given individual Neverthelessthey may vary across individuals we call this variation structural heterogeneityIf the bj rsquos are identical we write them as b0 We assume that bj rsquos are unknownNote that the presence of bj facilitates writing r without a j subscript as differ-ences in the structural relations across population members can be accommodatedby variations in the possibly infinite-dimensional bj

To give a definition of causality for the structural system in Assumption A1 letDj = supp(Dj ) denote the support of Dj ie the smallest closed set containingDj with probability 1

DEFINITION 31 (Cross-Section Structural Causality) Let j be given If thefunction r(middot zu bj ) Dj rarr R

ky is constant on Dj for all admissible z and u

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 271

then Dj does not structurally cause Yj and we write Dj rArrS Yj Otherwise Dj

structurally causes Yj and we write Dj rArrS Yj

Here we implicitly assume that there is variation in potential cause Dj (ie Dj

is nondegenerated) and under the set of counterfactual policies Dj the structuralfunction r is invariant Therefore the invariant function r fully characterizes thecausalstructural relationship between Yj and (Dj Zj Uj )

Structural causality is structural functional dependence in a specific contextSimilar definitions can be given for Zj and Uj but we leave these implicitas these are not the main causes of interest Causality for components of Dj

is defined in an obvious way If we define Yj (d) = r(d Zj Uj bj

)as the

potential outcomes (see eg Rubin 1974 Rosenbaum and Rubin 1983 andHolland 1986) then the equivalent definition of structural noncausality is thatYj (d) = Yj (dlowast) as for all (ddlowast) isin Dj

Structural causality can be easily understood in the familiar linear structure forscalar Yj

Yj = bj0 + Dprimej bj1 + Z prime

j bj2 +Uj

where bj = (bj0bprimej1bprime

j2)prime If bj1 = 0 then Dj does not structurally cause Yj

Otherwise it does This is so natural and intuitive that one might wonder whycausality is such a thorny topic One main reason for confusion arises from therelation between causality and simultaneity as discussed above the triangularsystems considered here obviate this issue Another main reason for confusionis the failure to distinguish carefully between structural equations of the sort writ-ten above and regression equations which may look similar but need not havestructural content The equation above is entirely structural We will encounterregressions (conditional expectations) only after suitable structural foundationsare in place

Observe that heterogeneity in bj rsquos permits Dj to cause Yj for some j and notfor others

The nondegeneracy of Dj ensures that Dj contains at least two points so Dj

is a variable rather than a constant Variation in potential causes is fundamentalfor defining causality (cf Holland 1986) this is what makes possible analogousdefinitions of causality for Zj and Uj Significantly however because bj rsquos arefixed constants they cannot be causes of Yj Instead bj rsquos can be effects or candetermine effects This follows from the following formal definition

DEFINITION 32 (Intervention and Effect) Let j be given and let d and dlowastbe distinct admissible values for Dj Then d rarr dlowast is an intervention to Dj Let(zu) be admissible values for (Zj Uj ) The difference ylowast

j minus yj = r(dlowast zu bj

)minusr(d zu bj

)is the effect on Yj of the intervention d rarr dlowast to Dj at (zu)

This is also referred to as the ldquocausal effectrdquo in the treatment effect literature(see eg Rubin 1974) For the linear case with scalar d the effect of a one-unit

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

272 XUN LU ET AL

intervention d rarr d +1 is

ylowastj minus yj = (

bj0 + (d +1)bj1 + zprimebj2 +u)minus (

bj0 +d bj1 + zprimebj2 +u) = bj1

For unit j this effect is fixed ie it is the same constant for all (zu) Becausebj can differ across units this permits effect heterogeneity

Alternatively if Yj = bj0 + Zj Dprimej bj1 +Uj with scalar Zj then the effect of

d rarr dlowast is

ylowastj minus yj = (

bj0 + zdlowastprimebj1 +u)minus (

bj0 + zd primebj1 +u) = z

(dlowast minusd

)primebj1

Here the effect of Dj on Yj depends on z but not u In this case bj1 determinesthe effect of Dj on Yj together with Zj When an effect depends on a variable(ie an element of Zj or Uj ) it is standard (especially in the epidemiologicalliterature) to call that variable an effect modifier

For simplicity we call ylowastj minus yj the effect of d rarr dlowast Chalak and White (2012)

discuss indirect direct and total effects in structural equations In the potentialoutcome framework Rubin (2004) discusses ldquodirectrdquo and ldquoindirectrdquo casual effectusing principal stratification

We distinguish the structural causal effect from various average causal effectssuch as E[r(dlowast Zj Uj bj ) minus r(d Zj Uj bj )] It is clear that structural non-causality implies zero average causal effects while the converse is not necessarilytrue

To describe identified effects in samples from heterogeneous populationswe define population groups Jg g = 1 as collections of populationunits having identical bj and identical distributions of (Dj Uj ) | X j whereX j = (Z prime

j W primej )

prime We define G = 1 As the population is finite so is thenumber of groups le N We define Ng = j isinJg where middot is the cardinal-ity of the indicated set and let pg = NgN be the proportion of population unitsbelonging to group Jg The bj rsquos by themselves need not define groups as thedistributions of (Dj Uj ) | X j may differ for units with identical bj We call cross-group variation in the distributions of (Dj Uj ) | X j distributional heterogeneityto distinguish this from structural heterogeneity that refers to cross-group varia-tion in bj

Note that because groups are defined by unobservable constants bj and dis-tributions involving the unobservable Uj in general we do not know for surewhether two units belong to the same group Interestingly in the related litera-ture Bester and Hansen (2016) consider estimating grouped effects in panel datamodels when each individualrsquos group identity is known and Su Shi and Phillips(2014) consider identifying latent group structures in panel data models via a vari-ant of Lasso Nevertheless both groups of researchers have only focused on thestructural heterogeneity in the panel framework

Given A1 it follows that (Yj Dj ) | X j is identically distributed for all unitsj in group Jg As a convenient shorthand for j isin Jg we write (Yj Dj ) | X j sim(Yg Dg) | Xg where A sim B denotes that A is distributed as B

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 273

Typically we do not observe an entire population Instead we observe observa-tions sampled from the population in some way For simplicity and concretenesswe consider simple random sampling At the sample level we write4

Y = r (D Z U B)

where B is the randomly sampled bj Note that the randomness of B arises solelyfrom the sampling process We write X = (Z primeW prime)prime and distinguish X from thecauses D by calling X a vector of covariates

For any random vector X we let f (x) and F (x) denote the joint prob-ability density function (PDF) and cumulative distribution function (CDF)respectively For any two random vectors X and Y we use f (y|x) and F (y|x)to denote the conditional PDF and CDF of Y given X = x respectively Wealso use subscript j and g to denote individual j and group g respectivelyFor example Fg(ud x) and fg (u|d x) denotes the CDF of (Ug Dg Xg)and the conditional PDF of Ug given Dg Xg for group g respectively Notethat Fg(ud x) = Nminus1

g

sumjisinJg Fj (ud x) which defines the mixture CDF of

(Ug Dg Xg)

32 Gminuscausality a first encounterWe now define Gminuscausality for cross-section data based on Grangerrsquos philoso-phy of nonpredictability Holland (1986 p 957) considers a special case whereD is randomized and applies Gminuscausality to cross-section data5 As in Hollandwe define Gminusnon-causality as a conditional independent statement FollowingDawid (1979) we write X perp Y | Z to denote that X and Y are independentgiven Z and X perp Y | Z if X and Y are not independent given Z Translat-ing Granger and Newboldrsquos (1986 p 221) definition to the cross-section contextgives

DEFINITION 33 (GminusCausality) Let Y Q and S be random vectors IfQ perp Y | S then Q does not Gminuscause Y with respect to (wrt) S OtherwiseQ Gminuscauses Y wrt S

Here Y Q and S can be any random vectors and this notion has no structuralcontent When S is a constant we have the simplest form of Gminusnon-causalityindependence Correlation is an example of Gminuscausality

To give Gminuscausality structural meaning we impose A1 assume that D obeysa suitable exogeneity condition and take Q = D and S = X The exogeneityassumed for D ensures identification for various measures of its effects and canoften be structurally justified It also turns out to be ideally suited to endowingGminuscausality with structural meaning

To see how this works in a simple setting consider the homogeneous case withD being a binary scalar in which average treatment effects (AT E) and averagetreatment effects on the treated (AT T ) are often discussed (see eg Rubin 1974

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

274 XUN LU ET AL

Hahn 1998 Hirano Imbens and Ridder 2003 and Angrist and Pischke 2009)Define the covariate-conditioned effect of treatment on the treated as

AT TX = E (Y (1)minusY (0) | D = 1 X)

where Y (1) is the potential response to treatment and Y (0) is the potentialresponse in the absence of treatment Using A1 we write Y (1) = r(1 Z U b0)and Y (0) = r(0 Z U b0) AT TX thus has a clear structural interpretation Fromthis we can construct AT T

AT T = E (Y (1)minusY (0) | D = 1) = E(AT TX | D = 1)

Note that here we define AT T and AT TX based on our structural equation r Nevertheless they can be defined without a structural model ie based on thetwo potential outcomes Y (0) and Y (1) (see eg Rubin 1974 and Angrist andPischke 2009)

To identify AT TX it suffices that D perp Y (0) | X Given A1 it suffices forthis that D perp U | X as is readily verified This conditional exogeneity is acommon identifying assumption in cross-sections6 Classical strict exogeneity((D Z) perp U ) is sufficient but not necessary for this Using D perp U | X in thesecond line below we have

AT TX = E [r(1 Z U b0) | D = 1 X ]minus E [r(0 Z U b0) | D = 1 X ]

= E [r(1 Z U b0) | D = 1 X ]minus E [r(0 Z U b0) | D = 0 X ]

= E(Y | D = 1 X)minus E(Y | D = 0 X) = μ(1 X)minusμ(0 X)

Thus AT TX can be expressed in terms of the distribution of observables (hereμ(1 X) and μ(0 X)) so this effect is identified (eg Hurwicz 1950) Simi-larly AT T is identified The identification result based on conditional exogeneityhas been extensively discussed in the treatment effect literature (see eg Rubin1974 Holland 1986 and Angrist and Pischke 2009)

To see the structural meaning for Gminuscausality observe that E(Y | D = 1 X)minusE(Y | D = 0 X) = 0 as is another way to write E(Y | D X) = E(Y | X) asThus in this context Gminusnon-causality in mean of D for Y wrt X is equivalentto AT TX = 0 as

Note that even for binary treatments Gminuscausality in mean does not tell thefull story as Gminuscausality in distribution can hold even when AT TX or AT Tvanishes For example suppose

Y = U D

with D perp U E(U ) = 0 E(U 2) lt infin and E(D2) lt infin Then D structurallycauses Y its effect on Y is U a random effect These effects may be posi-tive or negative however they average out as AT T = 07 Here and as we willshow generally Gminusnon-causality essentially has the interpretation that Y is not

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 275

structurally caused by D under the key conditional exogeneity assumption Inthis example we have Gminuscausality (D perp Y ) as E(Y 2 | D) = D2 E(U 2) TestingGminusnon-causality will detect the structural causality here in contrast to testingAT T = 0 or equivalently Gminusnon-causality in mean of D for Y

When D is nonbinary A1 together with suitable exogeneity conditionssimilarly suffices to identify certain effects that give structural meaning toGminuscausality This makes it generally possible to test for structural causality bytesting for Gminuscausality

With heterogeneity matters become somewhat more complicated In particularin the binary case μ(1 X)minusμ(0 X) is no longer AT TX Instead with suitableexogeneity (Assumption A2 below) we have that for each x μ(1 x)minusμ(0 x)recovers a blended effect

μ(1 x)minusμ(0 x) =sumg=1

ζg|x AT Tg|x

where the ζg|x rsquos are nonnegative weights adding to one and for all j isin JgAT Tg|x = E[Yj (1) minus Yj (0) | Dj = 1 X j = x] is the group-specific covariate-conditioned average effect of treatment on the treated Here Yj (1) is the poten-tial response to treatment and Yj (0) is the potential response in the absence oftreatment

33 Exogeneity and effect identification with heterogeneity

With heterogeneity an identifying exogeneity condition relevant for analyzingGminuscausality is

Assumption A2 (Heterogeneous Cross-Section Exogeneity) For all g isin G(i) Dg | Xg sim D | X and (i i) Dg perp Ug | Xg

Observe that this imposes structures for all groups namely all g isin G Althoughthis condition is not the weakest possible (see Chalak and White (2012) and thediscussion below) A2 generally suffices to identify effects and link Gminuscausalityand structural causality A2(i) restricts the allowed distributional heterogeneityfor all g isin G the conditional distributions of Dg given Xg are assumed identicalA2(i i) imposes conditional exogeneity for all groups Below we discuss thisfurther

THEOREM 31 Suppose A1 - A2(i) hold If A2(i i) holds then D perp U | X

It is also of interest to know whether the converse holds A strict conversedoes not hold as certain fortuitous cancellations in population-group condi-tional distributions can yield sample conditional exogeneity without population-group conditional exogeneity Nevertheless the converse does hold under a mildregularity condition ruling out exceptional cases We introduce the followingdefinition

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

276 XUN LU ET AL

DEFINITION 34 Ug | Dg Xg g isin G is regular ifsumgisinG

pg fg(u | d x) fg(x) =sumgisinG

pg fg(u | x) fg(x) for all (ud x) (32)

implies Ug perp Dg | Xg for all g isin GWhen G = 1 Ug | Dg Xg g isin G is necessarily regular To understand whatregularity rules out consider the next simplest case with G = 2 and Xg absentThen Ug | Dg g isin G fails to be regular if and only if U2 perp D2 (say) and

f1(u | d) = f1(u)minus 1minus p1

p1[ f2(u | d)minus f2(u)] for all (ud)

where the two groups have been re-indexed for convenience Suppose that p1f1(u) f2(u) and f2(u | d) are arbitrary Then it is easily arranged that f1(u | d)can be negative for (ud) in a set of positive probability so f1(u | d) does notdefine a conditional density and Ug | Dg g isin G is regular after all If this f1(u |d) is nevertheless a conditional density it is clearly highly special as it ensuresthat eq(32) holds for the given p1 and the functions defined by f1(u) f2(u) andf2(u | d) (all of which are typically unknown) but not necessarily otherwise

In the general case Ug | Dg Xg g isin G fails to be regular if and only ifUg perp Dg | Xg for some g isin G and

f1(u | d x) = f1(u | x)minus (p1 f1(x))minus1sumg=2

pg [ fg(u | d x) minus fg(u | x)] fg(x)

times for all (ud x)

where we again re-index the groups As before this is clearly a very special pop-ulation configuration ruling out such cases by imposing regularity is a weak re-striction

The converse result is

THEOREM 32 Suppose A1 - A2(i) hold and suppose Ug | Dg Xgg isin Gis regular If D perp U | X then A2(i i) holds

Theorems 31-32 also hold with Yg | Dg Xg g isinG regular D perp U | X replacedby D perp Y | X and A2(i i) replaced by Dg perp Yg | Xg g isin G This result plays acentral role in linking structural causality and Gminuscausality

34 Linking structural causality and GminuscausalityAs we have seen A1 imposes structures where causal effects are well definedA2 permits recovering versions of these As we now prove this gives structuralmeaning to Gminuscausality generally Recall that the bj rsquos are identical in group Jgthus the same structural causality relations hold for all j in a given group Wewrite Dg rArrS Yg when Dj rArrS Yj for (all) j in group Jg Our next result showsthat structural noncausality implies Gminusnon-causality

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 277

PROPOSITION 33 Let A1 - A2 hold Then structural noncausality(Dg rArrS Yg forallg isinG) implies that D does not Gminuscause Y wrt X ie D perp Y | X

This result is intuitive when structural effects are well defined and identifiableGminusnon-causality follows from structural noncausality Holland (1986 p 958)gives a similar result for the random experiment case Our Proposition 33 jus-tifies using tests of Gminusnoncausality to test structural noncausality if we rejectGminusnoncausality we must reject structural noncausality

As WL show structural causality does not imply Gminuscausality Without furtherassumptions the concepts are not equivalent Nevertheless if as is commonlyassumed in the literature (see WL for discussion) r obeys separability betweenobservable causes D and unobservable causes U or r obeys a specific form ofmonotonicity in U then with suitable regularity structural causality does implyGminuscausality We have

THEOREM 34 Given A1 - A2 suppose Yg | Dg Xgg isin G is regularSuppose further that for all g isin G either (i) or (i i) holds

(i) For all j isin Jg for unknown measurable functions r1 and r2

Yj = r1(Dj Zj bj

)+ r2(Zj Uj bj

) (33)

(ii) For all j isinJg for = 1 ky r(d zu bj ) = r0(d zu bj ) for scalaru where r0(d z middot bj ) is strictly monotone increasing for each admissi-ble (d z) and Fj(y | d x) = P[Yj le y | Dj = d X j = x] is strictlymonotone in y for each admissible (d x)

Then structural causality (for some g isin G Dj rArrS Yj for (all) j in Jg) impliesthat D Gminuscauses Y wrt X ie D perp Y | X

Kasy (2011) gives a discussion of structures satisfying (i i) Even without sep-arability or monotonicity there is an equivalence between Gminuscausality and astronger notion of structural causality that handles certain exceptional cases wherethe causal structure and the conditional distribution of (Uj Dj ) given X j inter-act in just the right way to hide the structural causality WL call this strongernotion structural causality with positive probability and provide discussions Welet Yj = supp(Yj ) and Xj = supp(X j )

DEFINITION 35 (Structural Causality with Positive Probability) SupposeA1 holds and let j be given Suppose that for each y isin Yj there exists a mea-surable function f jy Xj rarr [01] such thatint

1r(Dj Zj u bj

)lt y

d Fj

(u | X j

) = f jy(X j

)as (34)

where Fj (u | X j ) = E[1Uj le u | X j ] Then Dj does not structurally cause Yj

almost surely (as) wrt X j (Dj rArrS(X j ) Yj ) Otherwise Dj structurally causesYj with positive probability (wpp) wrt X j

(Dj rArrS(X j ) Yj

)

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

278 XUN LU ET AL

By the definition of a group the same structural causality wpp relations holdfor all j in a given group We thus write Dg rArrS(Xg) Yg when Dj rArrS(X j ) Yj holdsfor all j in group Jg

Below is an example in WL in which Dj structurally causes Yj while Dj doesnot structurally cause Yj as Consider the structural equation r

Yj = r(Dj Uj

) = DjradicD2

j +1Uj1 + 1radic

D2j +1

Uj2

where Uj equiv (Uj1Uj2) and Dj are all N (01) random variables and Dj U1 j andU2 j are mutually independent For simplicity there is no bj Zj or X j It is clearthat Dj structurally causes Yj here Nevertheless Dj does not structurally causeYj as To see this note that Yj and Dj are independent Thus the LHS of eq(34)becomesint

1r(Dj u1u2

)lt y

d Fj (u1u2)

=int

1r(Dj u1u2

)lt y

d Fj

(u1u2|Dj

) = Pr[Yj lt y|Dj

] = (y)

where Fj (middot) and Fj(middot|Dj

)denote the CDF of Uj and the conditional CDF of Uj

given Dj and is the standard normal CDF Thus the RHS does not depend onDj ie Dj does not structurally cause Yj as

Theorem 35 below gives a result linking Gminuscausality and structural causalityas

THEOREM 35 Suppose A1 - A2 hold Suppose also that Yg | Dg Xgg isinG is regular If Dj structurally causes Yj wpp wrt X j (for some g isin GDj rArrS(X j ) Yj for all j in Jg) then D Gminuscauses Y wrt X ie D perp Y | X

Together Proposition 33 and Theorem 35 establish that cross-sectionGminuscausality and structural causality are essentially equivalent under the key con-ditional exogeneity assumption We say ldquoessentiallyrdquo as Theorem 35 rules outthe two ways that Gminusnon-causality can mask structural causality The first arisesfrom subtle interactions between the causal structure and the conditional distribu-tion of (Uj Dj ) given X j The second arises under heterogeneity from delicatecancellations among the conditional distributions of Yj given Dj and X j acrossgroups These exceptional possibilities only trivially mitigate the power of testsfor Gminuscausality as tests for structural causality Errors of interpretation and in-ference need not result as long as these exceptions are recognized

35 Identification and identification failure

To gain further insight into the relation between Gminuscausality and effect identi-fication we now undertake a deeper analysis of Gminuscausality in mean For this

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 279

let qg(d x) = fg(d x)sumhisinG (ph fh(d x)) Under A1 the law of iterated ex-

pectations gives

E (Y | D = d X = x) =sumgisinG

pg qg(d x)

intr(d zu bg) d Fg(u | d x)

For concreteness consider identifying average marginal effects for a continu-ous treatment d To allow both discrete and continuous Ug let νg(u | d x) definea σminusfinite measure dominating that defined by Fg(u | d x) and let fg(u | d x)define the associated Radon-Nikodym density for Ug given (Dg Xg) Assum-ing differentiability and mild regularity derivations parallel to those of White andChalak (2013) give

part

partdE(Y | D = d X = x)

=sumgisinG

pg qg(d x)

intpart

partdr(d zu bg) fg(u | d x) dνg(u | d x)

+sumgisinG

pg qg(d x)

int[r(d zu bg)

part

partdln fg(u | d x)]

times fg(u | d x) dνg(u | d x)

+sumgisinG

pgpart

partdqg(d x)

intr(d zu bg) fg(u | d x)dνg(u | d x)

After some manipulation this regression derivative can be written as

part

partdE (Y | D = d X = x) = β(d x)+ δ1(d x)+ δ2(d x) (35)

The first of the three terms on the right is a weighted average marginal effect

β(d x) =sumgisinG

pg qg(d x)βg(d x) where βg(d x)

= E[(partpartd)r(Dg ZgUg bg) | Dg = d Xg = x

]is the covariate-conditioned average marginal effect of Dj on Yj at (Dj = d X j =x) for j isin Jg g isin G Thus when the other two terms vanish the regressionderivative identifies the structurally meaningful weighted effect β(d x) andwhen β(d x) is nonzero for (d x) in a set of positive probability we haveGminuscausality in mean of D for Y wrt X When the other two terms do notvanish we generally have identification failure and there is no necessary linkbetween Gminuscausality and structural causality

The second term is a pure endogeneity bias

δ1(d x) =sumgisinG

pg qg(d x) E(εg Sg | Dg = d Xg = x

)

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

280 XUN LU ET AL

where εg = Ygminus E(Yg | Dg = d Xg = x

)is the regression residual for the given

group and Sg = (partpartd) ln fg(Ug | Dg Xg) is the exogeneity score of White andChalak (2013) Observe that E(εg Sg | Dg = d Xg = x) represents a specificform of omitted variable bias where the exogeneity score is the omitted variableA sufficient (but not necessary) condition for this bias to vanish is A2(i i) Dg perpUg | Xg for all g isin G

The third term is a pure heterogeneity bias

δ2(d x) =sumgisinG

pgpart

partdqg(d x) E

(Yg | Dg = d Xg = x

)

due to heterogeneity of fg(d | x) across members of G8 A sufficient (but notnecessary) condition for this to vanish is A2(i) Dg | Xg sim D | X for all g isin G

Thus A2 ensures that the regression derivative identifies the weighted effect

β(d x) = βlowast(d x) =sumgisinG

pg qlowastg(x)βlowast

g(d x)

where qlowastg(x) = fg(x)

sumh (ph fh(x)) and βlowast

g(d x) = int(partpartd)r(d zu bg)

d Fg(u | x) This identification provides the link between Gminuscausality in meanand structural causality

Although A2 is not necessary for effect identification cases where identifica-tion holds in the absence of A2 are quite special analogous to failures of reg-ularity Thus for practical purposes A2 can be viewed as playing the key rolein identifying effects of interest and thereby linking Gminuscausality and structuralcausality

Further as Proposition 61 and Corollary 62 of WL show in the absence ofstructural causality Gminuscausality is essentially equivalent to exogeneity failureHere this is reflected in the fact that when βg(d x) vanishes for all g A1 andmild regularity conditions give

part

partdE (Y | D = d X = x) = δ1(d x)+ δ2(d x)

Thus with structural noncausality as Gminuscausality in mean implies the fail-ure of A2(i) A2(i i) or both conversely failure of A2 essentially ensuresGminuscausality in mean

36 Testing for Gminuscausality and structural causality in cross-sectiondata

As just seen the hypothesis of Gminusnon-causality and thus of structural noncausal-ity under the conditional exogeneity assumption is a specific conditional inde-pendence In the literature there are many conditional independence tests thatapply to IID data (eg see Delgado and Gonzalez-Manteiga 2001 Fernandes and

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 281

Flores 2001 Su and White 2007 2008 2014 Song 2009 Linton and Gozalo2014 Huang Sun and White 2016) These methods can be computationallychallenging as they are nonparametric Angrist and Kuersteiner (2004 2011)develop a semi-parametric test for conditional independence Based on the factthat Y perp D | X implies ψ1(Y ) perp ψ2(D) | X for any measurable vector functions(ψ1ψ2) WL propose several convenient regression-based tests

4 STRUCTURAL CAUSALITY AND GminusCAUSALITY IN PANEL DATAThe literature contains considerable discussion about Gminuscausality in the paneldata setting Nevertheless the focus is mainly on linear or parametric modelsFor example Chamberlain (1984) discusses Gminuscausality ldquoconditional on un-observables [lsquofixed effectrsquo]rdquo in a linear model and a logit model Holtz-EakinNewey and Rosen (1988) consider a traditional linear vector autoregression(VAR) and use GMM to test Gminuscausality Nair-Reichert and Weinhold (2001)discuss Gminuscausality in a dynamic mixed fixed- and random-coefficients linearmodel allowing heterogeneity across individuals Dumitrescu and Hurlin (2012)test for Gminuscausality in a linear dynamic panel model with fixed coefficients thatvary across individuals There is also a growing literature on nonlinear and non-separable panel models (see eg Hoderlein and White (2012) and the referencestherein) We are not aware that Gminuscausality has been discussed in such modelsso far

We emphasize that in panel data Gminuscausality is also entirely based on pre-dictability and has no causal interpretation In this section we link Gminuscausalityand structural causality in general panel structures Since the static panel case issimilar to the cross-sectional case we focus on a dynamic data generating processTo simplify notation for given y isin N+ we let Y jtminus1 denote the finite historyY jtminus1 = (Yjtminusy Yjtminus1)

Assumption B1 (Panel Population DGP) Let (F P) be a complete prob-ability space let N isin N+ and let y isin N+ For the members of a populationj isin 1 N let the random vectors Yjt t = 12 be structurally determinedby a triangular system as

Yjt = r(Y jtminus1 Djt Zjt Ujt bjt

) (41)

where r is an unknown measurable ky times 1 function ky isin N+ Yjτ (τ = 1 minusy 0) Djt (kd times1kd isin N+) Zjt (kz times1 kz isin N) and Ujt (ku times1 ku isin N)are vectors of nondegenerate random variables on (F P) and bjt is a non-random real vector of dimension kb isin N Suppose also that Wjt t = 12 arerandom vectors on (F P) with dimension kw isin N The triangular structureis such that for t = 12 neither Yjsinfins=1minusy

nor Djsinfins=1 structurally deter-mine Wjt Yjsinfins=1minusy

does not structurally determine Djt Zjt or Ujt andDjsinfins=1 does not structurally determine Zjt or Ujt

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

282 XUN LU ET AL

The interpretation of structural eq(41) is the same as that for eq(31) ex-cept that here we have multiple time periods t = 12 The elements of(Yjt Djt Zjt Wjt Ujt

)and bjt rsquos can contain both time-varying and time-

invariant elements For now we do not distinguish these The elements of(Djt Zjt Wjt Ujt ) can contain lags of underlying variables By permittingonly finite histories Y jtminus1 Djt Zjt we restrict attention to Markov-type datagenerating processes this greatly simplifies the analysis and corresponds to thestructures usually considered in practice

As in the cross-section case we can have both structural heterogeneity anddistributional heterogeneity We specify the relevant structural and distributionalheterogeneities below

We again impose random sampling from the population For simplicity weassume all time periods can be observed for every individual so we observe abalanced panel Similar to the cross-section case we write

Yt = r (Y tminus1 Dt Zt Ut Bt ) t = 1 T

by suppressing the cross-sectional subscript i We assume that we observe data(Yt Dt Zt Wt ) t = 1 T and relevant lags prior to t = 1

41 Linking structural causality and Gminuscausality in dynamic panelsWe first define structural causality for the dynamic panel structure Let Djt be thesupport of Djt

DEFINITION 41 (Structural Causality in Dynamic Panels) Let j and t begiven If the function r(y jtminus1 middot zjt ujt bjt ) Djt rarr R

ky is constant on Djt

for all admissible y jtminus1 zjt and ujt then Djt does not structurally cause Yjt and we write Djt rArrS Yjt Otherwise Djt structurally causes Yjt and we writeDjt rArrS Yjt

We let X jt = (Z primejt W prime

jt )prime denote the covariates at time t WL consider

retrospective conditional exogeneity and retrospective Gminuscausality (White andKennedy 2009) We also allow this For this we letX jt denote a covariate his-tory that may contain X jt and lags or leads of X jt as in Wooldridge (2005 p 41)or WL ie for some m isin N+

X jt = SX(X jtminusm X jt X jt+m

) t = 12

where SX is a given selection matrix We assume

X jt X jtminusm subeX jt and

allow X jt also to contain leads of X jt When Xjt contains leads we have theretrospective case as in WL We denote randomly sampled covariates X t Forsimplicity we also assume that all needed elements of X t are observable Tem-poral precedence plays an important role in panel data In particular comparedwith the cross-sectional case we now have a richer set of covariates (includingboth leads and lags) to choose from

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 283

To provide a link between structural causality and Gminuscausality for dynamicpanels we condition on not only covariates X t but also lags of Yt For thiswe denote Y jtminus1 = (Yjtminusmy Yjtminus1) where my isin N+ and my ge y ie

Y jtminus1 sube Y jtminus1 Similarly we let Y tminus1 be the randomly sampled Y jtminus1Similar to the cross-section case for each t we define population group Jgt

g isin 1 t as a collection of population units having identical bjt and iden-tical distribution of (Ujt Djt ) |X jt Y jtminus1 We also let Gt = 1 t Forgiven t we write Dgt rArrS Ygt to denote Djt rArrS Yjt for all j in group Jgt g isin Gt

Next we impose an exogeneity assumption analogous to A2

Assumption B2 (Heterogeneous Dynamic Panel Exogeneity) Let t be givenFor all g isin Gt (i) Dgt |Xgt Y gtminus1 sim Dt |X t Y tminus1 and (i i) Dgt perp Ugt |Xgt Y gtminus1

Here as before for all g isin Gt the conditional distributions of Dgt given(Xgt Y gtminus1) are assumed to be identical and we write the common distributionas Dt |X t Y tminus1

The analog of Theorem 31 relating population and sample conditional exo-geneity is

THEOREM 41 Suppose B1ndashB2(i) hold and let t be given If B2(i i) holdsthen Dt perp Ut |X t Y tminus1

Proposition 42 provides a link between structural noncausality and Gminusnon-causality

PROPOSITION 42 Suppose that B1 and B2 hold Let t be given Thenstructural noncausality (Dgt rArrS Ygt forallg isin Gt ) implies that Dt does notGminuscause Yt wrtX t Y tminus1 ie Dt perp Yt |X t Y tminus1

We can also define structural causality with positive probability for dynamicstructures and show that structural causality and Gminuscausality are ldquoessentiallyrdquoequivalent under the key conditional exogeneity assumption For brevity we omitthe details

So far we have not carefully distinguished between time-varying and time-invariant elements of unobservable bjt and Ujt Below we briefly discuss thecase where we allow time-invariant components of Ujt and the randomly sampledbjt to be arbitrarily dependent or correlated with the cause of interest Djt For this we denote Ujt equiv (Ujt Uj0) where Ujt is time varying and Uj0time-invariant Similarly we denote bjt equiv (bjt bj0) The randomly sampledUjt Uj0 bjt bj0 are denoted as Ut U0 Bt B0 respectively In princi-ple we want to remove the time-invariant Uj0 and bj0 using eg the firstdifference For this it is convenient to impose a separable assumption on thestructural equation9 We also impose a conditional exogeneity assumption at thesample level

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

284 XUN LU ET AL

Assumption B3 Suppose that B1 holds with

Yjt = r1(Djt Uj0 bj0

)+r2(Y jtminus1 Djt Zjt Ujt bjt

) j = 1 N t = 1

where r1 and r2 are two unknown ky times1 measurable functions

Assumption B4 Suppose that for given t Dt perp (Ut Utminus1 Bt Btminus1

) |Xt Y tminus1

B4 is a sample-level assumption which can be supported by a correspond-ing assumption at the heterogeneous population level Note that in general B4is a strong assumption as we condition on Y tminus1 which is a function of bothDtminus1 and Utminus1 Nevertheless this assumption is plausible under the null of struc-tural noncausality as Y jtminus1 is a constant function of Djtminus1 under the nullOne simple example is that bjt rsquos are constants over j and t and Djt infint=1 perpYjt 0

t=1minusyUjt infint=1X jt infint=1 Under the null of structural noncausality for

all t Y jtminus1 is a function of Yjs0s=1minusy

Ujsts=1X jst

s=1 Then it is easyto show that in this case B4 is satisfied Certainly it will be interesting to relaxB4 but we leave this for future research

PROPOSITION 43 Suppose that B1ndashB4 hold Suppose that Dgt rArrS Ygtand Dgtminus1 rArrS Ygtminus1 forallg isin Gt cupGtminus1 Then Dt perp Yt | Xt Y tminus1

Proposition 43 suggests that when there are time-invariant components in theunobservables we can test for structural noncausality by testing for Gminusnon-causality Dt perp Yt | Xt Y tminus1 under the conditional exogeneity assumption B4

42 Testing for Gminuscausality and structural causality in panel dataAs shown above we can perform a Gminusnon causality test to test for structuralnoncausality Testing Gminusnon-causality in panel data is also simply a conditionalindependence test Here we focus on testing Dt perp Yt | Xt Y tminus1

First assume that the joint distributions of (Dt Yt Xt ) are identical over t In this case we can pool the data and implement a conditional independence testas discussed in Section 36

Second suppose that the joint distributions of (Dt Yt Xt ) are different overtime t In this case for each time period t using the cross-section data we canimplement a conditional independence test for Dt perp Yt | (Xt Y tminus1) and obtain atest statistic sayWt Thus we have T test statistics W1W2 WT and eachone can be used to test for structural noncausality for each t We may also want totest the hypothesis that for all t Djt does not structurally cause Yjt For this wecan construct our test statistic by taking the average of the T test statistics ieW equiv 1

T

sumTt=1Wt as in Im Pesaran and Shin (2003) and Dumitrescu and Hurlin

(2012) Other forms of ldquoaveragesrdquo are possible and special care is needed to takeinto account the dependence structure over time

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 285

5 CONCLUSION

This paper provides direct links between Granger causality and structural causal-ity in cross-section and panel data We extend Granger causality to cross-section and panel data and give a natural definition of structural causality inheterogeneous populations We show that under the key conditional exogeneityassumption Granger causality is essentially equivalent to structural causality incross-section and panel data Similar to the results in White and Lu (2010) fortime-series data our results here should enable researchers to avoid the misuse ofGranger causality and to establish the desired structural causal relation

NOTES

1 Strictly speaking if the conditional distributions are parameterized or treated as infinite-dimensional parameters then distributional heterogeneity may also be thought of as a special caseof structural heterogeneity

2 The classification is certainly overly simplistic We ignore the early literature on the philosophyof causality (for a review see eg Holland 1986 and Hoover 2008) We also ignore the litera-ture on causality discussed in other disciplines notably in machine learning (see eg Pearl 20092015 White and Chalak 2009 and White Chalak and Lu 2011) In particular Heckman and Pinto(2015) compare Haavelmorsquos structural framework of causality with the framework of Directed AcyclicGraphs (DAG) studied in the Bayesian network

3 Although Uj is typically called ldquounobservablerdquo it is better to view its elements as variables thatwill be omitted from empirical analysis This may be because they are unobservable it may also bebecause the researcher has purposefully or inadvertantly neglected them

4 For notational simplicity we suppress the observation subscript i5 Using our notation Holland considers the case where D perp (XU ) which implies our condi-

tional exogeneity assumption D perp U | X below6 See eg Altonji and Matzkin (2005) Hoderlein and Mammen (2007 2009) Imbens and Newey

(2009) White and Lu (2011) and White and Chalak (2013)7 AT T = E(Y (1)minusY (0) | D = 1) = E(U minus0 | D = 1) = E(U ) = 0

8 The conditional density fg(d | x) can also be interpreted as Imbensrsquos (2000) generalized propen-sity score

9 There is a recent literature which allows time-invariant unobservables to be arbitrarily correlatedwith causes of interest in the general nonseparable models (see eg Evdokimov 2010) We leave theproblem of linking G-causality and structural causality in this general setting for future research

REFERENCES

Altonji J amp R Matzkin (2005) Cross section and panel data estimators for nonseparable models withendogenous regressors Econometrica 73 1053ndash1102

Angrist J K Graddy amp G Imbens (2000) The interpretation of instrumental variables estimatorsin simultaneous equations models with an application to the demand for fish Review of EconomicStudies 67 499ndash527

Angrist J O Jorda amp G Kuersteiner (2013) Semiparametric estimates of monetary policy effectsstring theory revisited NBER Working Paper 19355

Angrist J amp G Kuersteiner (2004) Semiparametric causality tests using the policy propensity scoreNBER Working Paper No 10975

Angrist J amp G Kuersteiner (2011) Causal effects of monetary shocks semiparametric conditionalindependence tests with a multinomial propensity score Review of Economics and Statistics 93725ndash747

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

286 XUN LU ET AL

Angrist J amp J Pischke (2009) Mostly Harmless Econometrics An Empiricistrsquos Companion PrincetonUniversity Press

Bester CA amp CB Hansen (2016) Grouped effects estimators in fixed effects models Journal ofEconometrics 190 197ndash208

Bonhomme S amp E Manresa (2015) Grouped patterns of heterogeneity in panel data Econometrica83 1147ndash1184

Browning M amp JM Carro (2007) Heterogeneity and microeconometrics modelling In R BlundellW K Newey and T Persson (Eds) Advances in Economics and Econometrics Theory andApplications Ninth World Congress of the Econometric Society Volume 3 pp 45ndash74 CambridgeUniversity Press

Browning M amp JM Carro (2010) Heterogeneity in dynamic discrete choice models EconometricsJournal 13 1ndash39

Burda M M Harding amp J Hausman (2015) A Bayesian semi-parametric competing risk model withunobserved heterogeneity Journal of Applied Econometrics 30 353ndash376

Chalak K amp H White (2011) An extended class of instrumental variables for the estimation of causaleffects Canadian Journal of Economics 44 1ndash51

Chalak K amp H White (2012) Causality conditional independence and graphical separation insettable systems Neural Computation 24 1611ndash1668

Chamberlain G (1982) The general equivalence of Granger and Sims causality Econometrica 50569ndash581

Chamberlain G (1984) Panel data In Z Griliches amp M Intriligator (eds) Handbook of Economet-rics vol2 ch 22 pp1247ndash1318 Elsevier

Chesher A (2003) Identification in nonseparable models Econometrica 71 1405ndash1441Chesher A (2005) Nonparametric identification under discrete variation Econometrica 73

1525ndash1550Dawid AP (1979) Conditional independence in statistical theory Journal of the Royal Statistical

Society Series B 41 1ndash31Deb P amp PK Trivedi (2013) Finite mixture for panels with fixed effects Journal of Econometric

Methods 2 35ndash51Delgado M amp W Gonzalez-Manteiga (2001) Significance testing in nonparametric regression based

on the bootstrap Annals of Statistics 29 1469ndash1507Dumitrescu E-I amp C Hurlin (2012) Testing for Granger non-causality in heterogeneous panels

Economic Modelling 29 1450ndash1460Evdokimov K (2010) Identification and estimation of a nonparametric panel data model with unob-

served heterogeneity Working paper Department of Economics Princeton UniversityFlorens JP (2003) Some technical issues in defining causality Journal of Econometrics 112

127ndash128Florens JP amp D Fougere (1996) Non-causality in continuous time Econometrica 64 1195ndash1212Florens JP amp M Mouchart (1982) A note on non-causality Econometrica 50 583ndash591Granger CWJ (1969) Investigating causal relations by econometric models and cross-spectral

methods Econometrica 37 424ndash438Granger CWJ (1980) Testing for causality a personal viewpoint Journal of Economic Dynamics

and Control 2 329ndash352Granger CWJ (1988) Some recent developments in a concept of causality Journal of Econometrics

39 199ndash211Granger CWJ amp P Newbold (1986) Forecasting Economic Time Series Academic Press 2nd

editionHausman J amp T Woutersen (2014) Estimating a semi-parametric duration model without specifying

heterogeneity Journal of Econometrics 178 114ndash131Haavelmo T (1943) The statistical implications of a system of simultaneous equations Econometrica

11 1ndash12Haavelmo T (1944) The probability approach in econometrics Econometrica 12 (supp) iii-vi

1ndash115

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 287

Hahn J (1998) On the role of the propensity score in efficient semiparametric estimation of averagetreatment effect Econometrica 66 315ndash331

Hahn J amp R Moon (2010) Panel data models with finite number of multiple equilibria EconometricTheory 26 863ndash881

Heckman JJ (2000) Causal parameters and policy analysis in economics a twentieth century retro-spective Quarterly Journal of Economics 115 45ndash97

Heckman JJ (2008) Econometric causality International Statistical Review 76 1ndash27Heckman JJ amp R Pinto (2015) Causal analysis after Haavelmo Econometric Theory 31

115ndash151Hirano K G Imbens amp G Ridder (2003) Efficient estimation of average treatment effects using the

estimated propensity score Econometrica 71 1161ndash1189Hoderlein S amp E Mammen (2007) Identification of marginal effects in nonseparable models without

monotonicity Econometrica 75 1513ndash1519Hoderlein S amp E Mammen (2009) Identification and estimation of local average derivatives in non-

separable models without monotonicity Econometrics Journal 12 1ndash25Hoderlein S amp H White (2012) Nonparametric identification in nonseparable panel data models with

generalized fixed effects Journal of Econometrics 168 300ndash314Holland P (1986) Statistics and causal inference Journal of the American Statistical Association 81

945ndash970Holtz-Eakin D W Newey amp HS Rosen (1988) Estimating vector autoregressions with panel data

Econometrica 56 1371ndash1396Hoover K (2008) Causality in economics and econometrics In S Durlauf and L Blume (eds)

The New Palgrave Dictionary of Economics 2nd edition Palgrave MacmillanHsiao C (2014) Analysis of Panel Data Cambridge University PressHuang M Y Sun amp H White (2016) A flexible nonparametric test for conditional independence

Econometric Theory forthcomingHurwicz L (1950) Generalization of the concept of identification In TC Koopmans (eds) Statistical

Inference in Dynamic Economic Models pp 238ndash257 John WileyIm K H Pesaran amp Y Shin (2003) Testing for unit roots in heterogeneous panels Journal of Econo-

metrics 115 53ndash74Imbens G (2000) The role of the propensity score in estimating dose-response functions Biometrika

87 706ndash710Imbens G (2004) Nonparametric estimation of average treatment effects under exogeneity a review

Review of Economics and Statistics 86 4ndash29Imbens G amp W Newey (2009) Identification and estimation of triangular simultaneous equations

models without additivity Econometrica 77 1481ndash1512Imbens G amp J Wooldridge (2009) Recent developments in the econometrics of program evaluation

Journal of Economic Literature 47 5ndash86Kasy M (2011) Identification in triangular systems using control functions Econometric Theory 27

663ndash671Koopmans T (1950) Statistical Inference in Dynamic Economic Models Cowles Commission Mono-

graph No 10 WileyKuersteiner G (2008) GrangerndashSims causality In S Durlauf and L Blume (Eds) The New Palgrave

Dictionary of Economics 2nd edition Palgrave MacmillanLechner M (2001) Identification and estimation of causal effect of multiple treatments under con-

ditional independence assumption In M Lechner F Pfeiller (eds) Econometric Evaluations ofLabour Market Policies Hildelberg Physical 43ndash58

Lechner M (2011) The relation of different concepts of causality used in time series and microecono-metrics Econometric Reviews 30 109ndash127

Lin C-C amp S Ng (2012) Estimation of panel data models with parameter heterogeneity when groupmembership is unknown Journal of Econometric Methods 1 42ndash55

Lindsay B (1995) Mixture Models Theory Geometry and Applications Hayward Institute for Math-ematical Statistics

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

288 XUN LU ET AL

Linton O amp P Gozalo (2014) Testing conditional independence restrictions Econometric Reviews33 523ndash552

Lu X amp L Su (2014) Determining the number of groups in latent panel structures Working paperSchool of Economics Singapore Management University

Matzkin R (2003) Nonparametric estimation of nonadditive random functions Econometrica 711339ndash1375

Matzkin R (2007) Nonparametric identification In JJ Heckman and EE Leamer (eds) The Hand-book of Econometrics 6B North-Holland

Mclachlan G amp K Basford (1988) Mixture Models Inference and Applications to Clustering MarcelDekker

Nair-Reichert U amp D Weinhold (2001) Causality tests for cross-country panels a new look at FDIand economic growth in developing countries Oxford Bulletin of Economics and Statistics 63153-171

Pearl J (2009) Causality Models Reasoning and Inference (2nd edition) Cambridge UniversityPress

Pearl J (2015) Trygve Haavelmo and the emergence of causal calculus Econometric Theory 31152ndash179

Rosenbaum P amp D Rubin (1983) The central role of the propensity score in observational studies forcausal effects Biometrika 70 41ndash55

Rubin D (1974) Estimating causal effects of treatments in randomized and non-randomized studiesJournal of Educational Psychology 66 688ndash701

Rubin D (2004) Direct and indirect causal effects via potential outcomes Scandinavian Journal ofStatistics 31 161ndash170

Sarafidis V amp N Weber (2015) A partially heterogeneous framework for analyzing panel data OxfordBulletin of Economics and Statistics 77 274ndash296

Sims C (1972) Money income and causality American Economic Review 62 540ndash552Song K (2009) Testing conditional independence via Rosenblatt transforms Annals of Statistics 37

4011ndash4015Strotz R amp H Wold (1960) Recursive vs nonrecursive systems An attempt at synthesis Economet-

rica 28 417ndash427Su L Z Shi amp PCB Phillips (2014) Identifying latent structures in panel data Working paper

Department of Economics Yale UniversitySu L amp H White (2007) A Consistent characteristic function-based test for conditional indepen-

dence Journal of Econometrics 141 807ndash834Su L amp H White (2008) A nonparametric Hellinger metric test for conditional independence Econo-

metric Theory 24 829ndash864Su L amp H White (2014) Testing conditional independence via empirical likelihood Journal of

Econometrics 182 27ndash44Sun Y (2005) Estimation and inference in panel structure models Working paper Department of

Economics UCSDWhite H amp K Chalak (2009) Settable systems An extension of Pearlrsquos causal model with optimiza-

tion equilibrium and learning Journal of Machine Learning Research 10 1759ndash1799White H amp K Chalak (2010) Testing a conditional form of exogeneity Economics Letters 109

88ndash90White H amp K Chalak (2013) Identification and identification failure for treatment effects using

structural systems Econometric Reviews 32 273ndash317White H K Chalak amp X Lu (2011) Linking Granger causality and the Pearl causal model with

settable systems Journal of Machine Learning Research Workshop and Conference Proceedings12 1ndash29

White H amp P Kennedy (2009) Retrospective estimation of causal effects through time In J Castleamp N Shephard (eds) The Methodology and Practice of Econometrics A Festschrift in Honour ofDavid F Hendry pp 59ndash87 Oxford University Press

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 289

White H amp X Lu (2010) Granger causality and dynamic structural systems Journal of FinancialEconometrics 8 193ndash243

White H amp X Lu (2011) Causal diagrams for treatment effect estimation with application to selectionof efficient covariates Review of Economics and Statistics 93 1453ndash1459

Wooldridge J (2005) Simple solutions to the initial conditions problem for dynamic nonlinear paneldata models with unobserved heterogeneity Journal of Applied Econometrics 20 39ndash54

Zellner A (1979) Causality and econometrics In K Brunner and AH Meltzer (eds) Three Aspectsof Policy and Policymaking Carnegie-Rochester Conference Series Vol 10 North-Holland 9ndash54

APPENDIX

Proof of Theorem 31 To show D perp U | X we establish that f (u | d x) = f (u | x)for all (ud x) We have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

= f (u x)

f (x)= f (u | x)

where the first equality in the second line holds since A2(i) (ie Dg | Xg sim D | X)implies fg(d x) = f (d | x) fg(x) say and the second holds by A2(i i) fg(u | d x) =fg(u | x) n

Proof of Theorem 32 Note that

f (u | x) = f (u x)

f (x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

We also have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | d x) fg(x)sum

gisinG pg fg(x)

The first equality in the second line follows from A2(i) Then f (u | d x) = f (u | x) forall (ud x) if and only ifsumgisinG

pg fg(u | d x) fg(x) =sumgisinG

pg fg(u | x) fg(x) for all (ud x)

By assumption (Ug | Dg Xg) g isinG is regular It follows immediately that for all g isinGDg perp Ug | Xg n

Proof of Proposition 33 Take any g isin G and let j belong to group Jg Dg rArrSYg means that Yj = r(Zj Uj bj ) Thus Uj perp Dj | X j implies that Yj perp Dj | X j byDawid (1979 Lemmas 41 and 42) This holds for all g isin G Applying Theorem 31gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

290 XUN LU ET AL

Proof of Theorem 34 We prove this by showing that Y perp D | X implies that Dg rArrSYg for all g isinG Given the assumed regularity Theorem 32 ensures that Y perp D | X impliesthat Yg perp Dg | Xg for all g isin G

Given (i) eq(33) and A2(ii) imply that for all j in group JgE

(Yj |X j = x Dj = d

) = r1(d z bj

)+ E[r2

(zUj bj

) | Dj = d X j = x]

= r1(d z bj

)+ E[r2

(zUj bj

) | X j = x]

Clearly r1(d z bj

)is constant in d if and only if E

(Yj |Dj = d X j = x

)is constant in

d Yj perp Dj | X j implies that E(Yj |Dj = d X j = x

)is constant in d which then implies

that r1(d z bj

)is constant in d Thus Dj rArrS Yj so Dg rArrS Yg

Given (i i) let ky = 1 without loss of generality Then by A2(ii) for all j in group JgFUj (u | x) = E

[1Uj le u

∣∣ X j = x]

= E[

1r(d zUj bj

) le r(d zu bj

)∣∣ X j = x]

= E[

1r(Dj Zj Uj bj

) le r(d zu bj

)∣∣ Dj = d X j = x]

= E[

1Yj le r

(d zu bj

)∣∣ Dj = d X j = x]

= Fj(r(d zu bj

) | d x)

By strict monotonicity of Fj (y | d x) in y

r(d zu bj

) = Fminus1j

(FUj (u | x) | d x

)

Now Yj perp Dj | X j implies that Fminus1j

(FUj (u | x) | d x

)is constant in d so r

(d zu bj

)is

also a constant function in d ie Dj rArrS Yj so Dg rArrS Yg As either (i) or (i i) holds for each g isin G we have Dg rArrS Yg for all g isin G n

Proof of Theorem 35 To show that A1 - A2 and the assumed regularity for Ygtogether with Dg rArrS(Xg) Yg for some g isin G imply D perp Y | X we let j in group Jgg isin G be such that Dj rArrS(X j ) Yj A2(i i) ensures that Dj perp Uj | X j so

P[Yj le y | Dj = d X j = x] =int

1r(d zu bj ) le y

d Fj (u | d x)

=int

1r(d zu bj ) le y

d Fj (u | x)

By assumption Dj rArrS(X j ) Yj so there exists ylowast isin supp(Yj ) such that there is no mea-

surable mapping f jylowast for whichint

1r(d zu bj ) le ylowastd Fj (u | d x) = f jylowast(x) ae-x Specifically this rules out the possibility thatint

1r(d zu bj ) le ylowast

d Fj (u | X j ) = f jylowast(X j ) equiv P[Yj le ylowast | X j ] as

Thus there exists ylowast isin supp(Yj ) such that

P[Yj le ylowast | Dj = d X j = x

] = P[Yj le ylowast | X j = x

]

for x in a set of positive probability so Yj perp Dj | X j Given the regularity assumed for YgTheorem 32 (for Yg) gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 291

Proof of Theorem 41 The proof is analogous to that of Theorem 31 n

Proof of Proposition 42 The proof is analogous to that of Proposition 33 n

Proof of Proposition 43 Under B3 Dgt rArrS Ygt and Dgtminus1 rArrS Ygtminus1 forallg isinGt cupGtminus1 imply that for j isin Jg there exist two measurable functions r1 and r2 such that

Yjt = r1(Uj0 bj0)+ r2(Y jtminus1 Zjt Ujt bjt

) j = 1 N t = 1

Hence

Yjt minusYjtminus1 = r2(Y jtminus1 Zjt Ujt bjt

)minus r2(Y jtminus2 Zjtminus1Ujtminus1 bjtminus1

)

At the sample level this means that

Yt minusYtminus1 = r2(Y tminus1 Zt Ut Bt

)minus r2(Y tminus2 Ztminus1Utminus1 Btminus1

)

Then B4 implies that

Dt perp (Xt Y tminus1Ut Utminus1 Bt Btminus1) | Xt Y tminus1

which further implies that Dt perp (Yt minusYtminus1) | Xt Y tminus1by Dawid (1979 Lemmas 41 and42) as (Yt minus Ytminus1) is a function of (Xt Y tminus1Ut Utminus1 Bt Btminus1) Then we have Dt perpYt | Xt Y tminus1 as Ytminus1 is a component of Y tminus1 n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

Page 5: 33 GRANGER CAUSALITY AND STRUCTURAL CAUSALITY IN … Lu_Su... · 2017. 2. 7. · 264 XUNLUETAL. natural experiments. The goal of this paper is to establish the analogous equiv-alence

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 267

are in general not equivalent For an excellent review on the relationship betweenGminusnoncausality and Sims noncausality see Kuersteiner (2008) Similar toGminusnoncausality Sims noncausality is completely based on predictability and hasno structural interpretation

22 Structural causality and causality in the potential outcomeframework

On the other hand causality defined on structural equations or potential outcomesis mostly discussed in cross-section data The structural equation approach canbe traced back to the work of the Cowles Commission in the 1940s (see egHaavelmo 1943 1944 and Koopmans 1950) though most of their work wasbased on linear equations More recently researchers have generalized linearequations to nonseparable and nonparametric equations (see eg Chesher 20032005 Matzkin 2003 2007 and Altonji and Matzkin 2005) In 2015 Economet-ric Theory published a special issue in memory of Trygve Haavelmo (Volume 31Issue 01 and 02) which contains many interesting discussions on recent develop-ments of causality related to Haavelmorsquos structural models

Consider a simple case where the response of interest is Yi and the cause ofinterest is Di The causal relation between Di and Yi can be characterized by anunknown structural equation r such that

Yi = r(Di Ui )

where Ui is other unobservable causes of Yi For example when Yi is the demandDi is the price and Ui represents certain demand shocks r is the demand functionthat is derived from utility maximization r is a general function and does notneed to be linear parametric or separable between observable causes Di andunobservable causes Ui Here r has a structural or causal meaning and we definethe causal effect of Di on Yi based on r Let D and U be the support of Di andUi respectively If r (du) is a constant function of d for all d isin D and all u isin Uthen we simply say that Di does not structurally cause Yi (see eg Heckman2008 and White and Chalak 2009) For a binary Di the effect of Di on Yi isr(1u) minus r(0u) when Ui = u The effect can depend on unobservable u thusunobservable heterogeneity is allowed For a continuous Di the marginal effectof Di on Yi is partr(du)partd when Di = d and Ui = u To identify the effects of Di

on Yi we often impose the assumption of conditional exogeneity

Di perp Ui | Xi

where Xi is some observable covariates This includes the special case where Di

and Ui are independent corresponding to Di being randomizedHalbert White has made substantial contribution on defining identifying and

estimating causal effects in structural equations White and Chalak (2009) ex-tend Judea Pearlrsquos causal model to a settable system which incorporates features

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

268 XUN LU ET AL

of central interest to economists and econometricians optimization equilibriumand learning Roughly speaking a settable system is ldquoa mathematical frameworkdescribing an environment in which multiple agents interact under uncertaintyrdquo(White and Chalak 2009 p 1760) In the settable system a variable of interesthas two roles ldquoresponsesrdquo and ldquosettingsrdquo When the value of the variable is deter-mined by the structural equation these values are called ldquoresponsesrdquo In contrastwhen the value is not determined by the structural equation but is instead set toone of its admissible values these values are called ldquosettingsrdquo They show thaton the settable system causes and effects can be rigorously defined Chalak andWhite (2012) provide definitions of direct indirect and total causality on the set-table system in terms of functional dependence and show how causal relationsand conditional independence are connected Chalak and White (2011) providean exhaustive characterization of potentially identifying conditional exogeneityrelationships in liner structural equations and introduce conditioning and condi-tional extended instrumental variables to identify causal effects White and Chalak(2013) provide a detailed discussion on the identification and identification failureof causal effects in structural equations White and Chalak (2010) discuss how totest the conditional exogeneity assumption

The treatment effect literature adopts the potential outcome framework (seeeg Rubin 1974 Rosenbaum and Rubin 1983 and Holland 1986) For a causeof interest Di and response of interest Yi we define a collection of potential out-comes Yi (d) d isin D where D is the support of Di Then we can define causalor treatment effect based on the potential outcomes Specifically for two valuesd and dlowast in D we define Yi (dlowast)minus Yi (d) as the causal effects of Di on Yi whenDi changes from d to dlowast If Yi (d) = Yi (dlowast) for all d and dlowast isin D we say thatDi has no causal effects on Yi For example when Di is binary ie D = 01Yi (0) and Yi (1) are the potential outcomes corresponding Di = 0 and Di = 1respectively The treatment or causal effect is Yi (1)minus Yi (0) Two average effectsoften discussed in the literature are

the average treatment effect (ATE) E(Yi (1)minusYi (0))

and

the average treatment effect on the treated (ATT) E(Yi (1)minusYi (0) | Di = 1)

Note that we only observe one outcome in data For example when Di is binarythe observed outcome is simply Yi = Di Yi (1) + (1 minus Di )Yi (0) To identify theeffects we often impose the assumption of unconfoundedness or selection onobservables

Yi (d) perp Di | Xi

where Xi is observable covariates Lechner (2001) Imbens (2004) Angrist andPischke (2009) and Imbens and Wooldridge (2009) provide reviews on identi-fication and estimation of various effects in the potential outcome framework

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 269

White and Chalak (2013 p 280) show that the structural equation framework isequivalent to the potential outcome framework For example the potential out-come Yi (d) is simply r(dUi ) The unconfoundedness assumption Yi (d) perp Di |Xi in the potential outcome framework is equivalent to the conditional exogeneityassumption Ui perp Di | Xi in the structural equation framework

23 Relationship between these two types of causality

We emphasize that Gminuscausality and Sims causality are entirely based on pre-dictability while the causality defined on structural equations or potential out-comes is a real causal relation In the literature there are several papers thatlink these two types of causality White and Lu (2011) provide a direct linkbetween Gminuscausality and structural causality in structural equations They showthat given the conditional exogeneity assumption structural noncausality isessentially equivalent to Gminusnoncausality Angrist and Kuersteiner (2004 2011)and Kuersteiner (2008) discuss the link between Sims noncausality and thenoncausality defined in the potential outcome framework under the assumption ofselection on observables Angrist Jorda and Kuersteiner (2013) study the mon-etary policy effects in the potential outcome framework Lechner (2011) linksGrangerSims noncausality to various average effects defined in the potentialoutcome framework However all the discussions so far have been made in thecontext of time series data

3 STRUCTURAL CAUSALITY AND GminusCAUSALITY INCROSS-SECTION DATA

31 Structural causality

We first specify a population DGP for a single time period omitting the timeindex We write N+ =12 and N = 0cupN+ We also write N+ = N+ cupinfin and N = 0cup N+

Assumption A1 (Cross-Section Heterogeneous Population DGP) Let(F P) be a complete probability space let N isin N+ and for the membersof a population j isin 1 N let the random vector Yj be structurally determinedby a triangular system as

Yj = r(Dj Zj Uj bj

) (31)

where r is an unknown measurable ky times 1 function ky isin N+ Dj Zj and Uj j = 12 are vectors of nondegenerate random variables on (F P) havingdimensions kd isin N+ kz isin N and ku isin N respectively and bj is a nonrandomreal vector of dimension kb isin N Suppose also that Wj is a random vector on(F P) with dimension kw isin N The triangular structure is such that Yj doesnot structurally determine Dj Zj and Uj and neither Yj nor Dj structurallydetermines Wj

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

270 XUN LU ET AL

To keep causal concepts clear it is important to distinguish between the popu-lation DGP defined above and the sample DGP defined later For now we leavesampling aside We may also refer to members of the population as ldquounitsrdquo orldquoindividualsrdquo and the distribution of the population as a mixture distributionMixture distributions appear in many contexts in the literature and arise natu-rally where a statistical population contains two or more subpopulations (see egLindsay 1995 and Mclachlan and Basford 1988) As will be clear in a momentthe sources of population heterogeneity are in general not observed

We interpret Yj as the response of interest It is determined by variables Dj Zj and Uj and the constants bj The variables can be binary categorical orcontinuous Immediately below we formalize a natural notion of causality for thissystem For now it is heuristically appropriate to view Dj as observable causes ofinterest (eg a treatment) Zj as other observable causes not of primary interestand Uj as unobservable causes3

The structural function r is unknown and our goal will be to learn about itfrom a sample of the population Nevertheless when Wj has positive dimensionr embodies the a priori exclusion restriction that Wj does not determine Yj Thetypical sources of this restriction as well as the identities of Dj Zj and Uj theirstatus as observable or not and the priority or precedence relations embodied inthe assumed triangularity are economic theory and specific domain knowledge

The assumed triangularity rules out explicit simultaneity for succinctness andclarity For the causality in simultaneous structural equations there are differentviews in the literature White and Chalak (2009 2013) follow Strotz and Wold(1960) and argue that simultaneous equations are not causal structural relationsbut are instead ldquomutual consistency conditions holding between distinct sets ofstructural equations ndash for example between one set of structural equations govern-ing partial equilibrium and another governing full equilibriumrdquo However otherresearchers argue that simultaneous structural equations can be given a causalinterpretation For example Angrist Graddy and Imbens (2000) provide a poten-tial outcome interpretation for general nonlinear simultaneous equation modelsThey also show that the standard linear instrumental variable estimator identifiesa weighted average of the causal effects

By definition the constants bj are fixed for a given individual Neverthelessthey may vary across individuals we call this variation structural heterogeneityIf the bj rsquos are identical we write them as b0 We assume that bj rsquos are unknownNote that the presence of bj facilitates writing r without a j subscript as differ-ences in the structural relations across population members can be accommodatedby variations in the possibly infinite-dimensional bj

To give a definition of causality for the structural system in Assumption A1 letDj = supp(Dj ) denote the support of Dj ie the smallest closed set containingDj with probability 1

DEFINITION 31 (Cross-Section Structural Causality) Let j be given If thefunction r(middot zu bj ) Dj rarr R

ky is constant on Dj for all admissible z and u

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 271

then Dj does not structurally cause Yj and we write Dj rArrS Yj Otherwise Dj

structurally causes Yj and we write Dj rArrS Yj

Here we implicitly assume that there is variation in potential cause Dj (ie Dj

is nondegenerated) and under the set of counterfactual policies Dj the structuralfunction r is invariant Therefore the invariant function r fully characterizes thecausalstructural relationship between Yj and (Dj Zj Uj )

Structural causality is structural functional dependence in a specific contextSimilar definitions can be given for Zj and Uj but we leave these implicitas these are not the main causes of interest Causality for components of Dj

is defined in an obvious way If we define Yj (d) = r(d Zj Uj bj

)as the

potential outcomes (see eg Rubin 1974 Rosenbaum and Rubin 1983 andHolland 1986) then the equivalent definition of structural noncausality is thatYj (d) = Yj (dlowast) as for all (ddlowast) isin Dj

Structural causality can be easily understood in the familiar linear structure forscalar Yj

Yj = bj0 + Dprimej bj1 + Z prime

j bj2 +Uj

where bj = (bj0bprimej1bprime

j2)prime If bj1 = 0 then Dj does not structurally cause Yj

Otherwise it does This is so natural and intuitive that one might wonder whycausality is such a thorny topic One main reason for confusion arises from therelation between causality and simultaneity as discussed above the triangularsystems considered here obviate this issue Another main reason for confusionis the failure to distinguish carefully between structural equations of the sort writ-ten above and regression equations which may look similar but need not havestructural content The equation above is entirely structural We will encounterregressions (conditional expectations) only after suitable structural foundationsare in place

Observe that heterogeneity in bj rsquos permits Dj to cause Yj for some j and notfor others

The nondegeneracy of Dj ensures that Dj contains at least two points so Dj

is a variable rather than a constant Variation in potential causes is fundamentalfor defining causality (cf Holland 1986) this is what makes possible analogousdefinitions of causality for Zj and Uj Significantly however because bj rsquos arefixed constants they cannot be causes of Yj Instead bj rsquos can be effects or candetermine effects This follows from the following formal definition

DEFINITION 32 (Intervention and Effect) Let j be given and let d and dlowastbe distinct admissible values for Dj Then d rarr dlowast is an intervention to Dj Let(zu) be admissible values for (Zj Uj ) The difference ylowast

j minus yj = r(dlowast zu bj

)minusr(d zu bj

)is the effect on Yj of the intervention d rarr dlowast to Dj at (zu)

This is also referred to as the ldquocausal effectrdquo in the treatment effect literature(see eg Rubin 1974) For the linear case with scalar d the effect of a one-unit

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

272 XUN LU ET AL

intervention d rarr d +1 is

ylowastj minus yj = (

bj0 + (d +1)bj1 + zprimebj2 +u)minus (

bj0 +d bj1 + zprimebj2 +u) = bj1

For unit j this effect is fixed ie it is the same constant for all (zu) Becausebj can differ across units this permits effect heterogeneity

Alternatively if Yj = bj0 + Zj Dprimej bj1 +Uj with scalar Zj then the effect of

d rarr dlowast is

ylowastj minus yj = (

bj0 + zdlowastprimebj1 +u)minus (

bj0 + zd primebj1 +u) = z

(dlowast minusd

)primebj1

Here the effect of Dj on Yj depends on z but not u In this case bj1 determinesthe effect of Dj on Yj together with Zj When an effect depends on a variable(ie an element of Zj or Uj ) it is standard (especially in the epidemiologicalliterature) to call that variable an effect modifier

For simplicity we call ylowastj minus yj the effect of d rarr dlowast Chalak and White (2012)

discuss indirect direct and total effects in structural equations In the potentialoutcome framework Rubin (2004) discusses ldquodirectrdquo and ldquoindirectrdquo casual effectusing principal stratification

We distinguish the structural causal effect from various average causal effectssuch as E[r(dlowast Zj Uj bj ) minus r(d Zj Uj bj )] It is clear that structural non-causality implies zero average causal effects while the converse is not necessarilytrue

To describe identified effects in samples from heterogeneous populationswe define population groups Jg g = 1 as collections of populationunits having identical bj and identical distributions of (Dj Uj ) | X j whereX j = (Z prime

j W primej )

prime We define G = 1 As the population is finite so is thenumber of groups le N We define Ng = j isinJg where middot is the cardinal-ity of the indicated set and let pg = NgN be the proportion of population unitsbelonging to group Jg The bj rsquos by themselves need not define groups as thedistributions of (Dj Uj ) | X j may differ for units with identical bj We call cross-group variation in the distributions of (Dj Uj ) | X j distributional heterogeneityto distinguish this from structural heterogeneity that refers to cross-group varia-tion in bj

Note that because groups are defined by unobservable constants bj and dis-tributions involving the unobservable Uj in general we do not know for surewhether two units belong to the same group Interestingly in the related litera-ture Bester and Hansen (2016) consider estimating grouped effects in panel datamodels when each individualrsquos group identity is known and Su Shi and Phillips(2014) consider identifying latent group structures in panel data models via a vari-ant of Lasso Nevertheless both groups of researchers have only focused on thestructural heterogeneity in the panel framework

Given A1 it follows that (Yj Dj ) | X j is identically distributed for all unitsj in group Jg As a convenient shorthand for j isin Jg we write (Yj Dj ) | X j sim(Yg Dg) | Xg where A sim B denotes that A is distributed as B

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 273

Typically we do not observe an entire population Instead we observe observa-tions sampled from the population in some way For simplicity and concretenesswe consider simple random sampling At the sample level we write4

Y = r (D Z U B)

where B is the randomly sampled bj Note that the randomness of B arises solelyfrom the sampling process We write X = (Z primeW prime)prime and distinguish X from thecauses D by calling X a vector of covariates

For any random vector X we let f (x) and F (x) denote the joint prob-ability density function (PDF) and cumulative distribution function (CDF)respectively For any two random vectors X and Y we use f (y|x) and F (y|x)to denote the conditional PDF and CDF of Y given X = x respectively Wealso use subscript j and g to denote individual j and group g respectivelyFor example Fg(ud x) and fg (u|d x) denotes the CDF of (Ug Dg Xg)and the conditional PDF of Ug given Dg Xg for group g respectively Notethat Fg(ud x) = Nminus1

g

sumjisinJg Fj (ud x) which defines the mixture CDF of

(Ug Dg Xg)

32 Gminuscausality a first encounterWe now define Gminuscausality for cross-section data based on Grangerrsquos philoso-phy of nonpredictability Holland (1986 p 957) considers a special case whereD is randomized and applies Gminuscausality to cross-section data5 As in Hollandwe define Gminusnon-causality as a conditional independent statement FollowingDawid (1979) we write X perp Y | Z to denote that X and Y are independentgiven Z and X perp Y | Z if X and Y are not independent given Z Translat-ing Granger and Newboldrsquos (1986 p 221) definition to the cross-section contextgives

DEFINITION 33 (GminusCausality) Let Y Q and S be random vectors IfQ perp Y | S then Q does not Gminuscause Y with respect to (wrt) S OtherwiseQ Gminuscauses Y wrt S

Here Y Q and S can be any random vectors and this notion has no structuralcontent When S is a constant we have the simplest form of Gminusnon-causalityindependence Correlation is an example of Gminuscausality

To give Gminuscausality structural meaning we impose A1 assume that D obeysa suitable exogeneity condition and take Q = D and S = X The exogeneityassumed for D ensures identification for various measures of its effects and canoften be structurally justified It also turns out to be ideally suited to endowingGminuscausality with structural meaning

To see how this works in a simple setting consider the homogeneous case withD being a binary scalar in which average treatment effects (AT E) and averagetreatment effects on the treated (AT T ) are often discussed (see eg Rubin 1974

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

274 XUN LU ET AL

Hahn 1998 Hirano Imbens and Ridder 2003 and Angrist and Pischke 2009)Define the covariate-conditioned effect of treatment on the treated as

AT TX = E (Y (1)minusY (0) | D = 1 X)

where Y (1) is the potential response to treatment and Y (0) is the potentialresponse in the absence of treatment Using A1 we write Y (1) = r(1 Z U b0)and Y (0) = r(0 Z U b0) AT TX thus has a clear structural interpretation Fromthis we can construct AT T

AT T = E (Y (1)minusY (0) | D = 1) = E(AT TX | D = 1)

Note that here we define AT T and AT TX based on our structural equation r Nevertheless they can be defined without a structural model ie based on thetwo potential outcomes Y (0) and Y (1) (see eg Rubin 1974 and Angrist andPischke 2009)

To identify AT TX it suffices that D perp Y (0) | X Given A1 it suffices forthis that D perp U | X as is readily verified This conditional exogeneity is acommon identifying assumption in cross-sections6 Classical strict exogeneity((D Z) perp U ) is sufficient but not necessary for this Using D perp U | X in thesecond line below we have

AT TX = E [r(1 Z U b0) | D = 1 X ]minus E [r(0 Z U b0) | D = 1 X ]

= E [r(1 Z U b0) | D = 1 X ]minus E [r(0 Z U b0) | D = 0 X ]

= E(Y | D = 1 X)minus E(Y | D = 0 X) = μ(1 X)minusμ(0 X)

Thus AT TX can be expressed in terms of the distribution of observables (hereμ(1 X) and μ(0 X)) so this effect is identified (eg Hurwicz 1950) Simi-larly AT T is identified The identification result based on conditional exogeneityhas been extensively discussed in the treatment effect literature (see eg Rubin1974 Holland 1986 and Angrist and Pischke 2009)

To see the structural meaning for Gminuscausality observe that E(Y | D = 1 X)minusE(Y | D = 0 X) = 0 as is another way to write E(Y | D X) = E(Y | X) asThus in this context Gminusnon-causality in mean of D for Y wrt X is equivalentto AT TX = 0 as

Note that even for binary treatments Gminuscausality in mean does not tell thefull story as Gminuscausality in distribution can hold even when AT TX or AT Tvanishes For example suppose

Y = U D

with D perp U E(U ) = 0 E(U 2) lt infin and E(D2) lt infin Then D structurallycauses Y its effect on Y is U a random effect These effects may be posi-tive or negative however they average out as AT T = 07 Here and as we willshow generally Gminusnon-causality essentially has the interpretation that Y is not

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 275

structurally caused by D under the key conditional exogeneity assumption Inthis example we have Gminuscausality (D perp Y ) as E(Y 2 | D) = D2 E(U 2) TestingGminusnon-causality will detect the structural causality here in contrast to testingAT T = 0 or equivalently Gminusnon-causality in mean of D for Y

When D is nonbinary A1 together with suitable exogeneity conditionssimilarly suffices to identify certain effects that give structural meaning toGminuscausality This makes it generally possible to test for structural causality bytesting for Gminuscausality

With heterogeneity matters become somewhat more complicated In particularin the binary case μ(1 X)minusμ(0 X) is no longer AT TX Instead with suitableexogeneity (Assumption A2 below) we have that for each x μ(1 x)minusμ(0 x)recovers a blended effect

μ(1 x)minusμ(0 x) =sumg=1

ζg|x AT Tg|x

where the ζg|x rsquos are nonnegative weights adding to one and for all j isin JgAT Tg|x = E[Yj (1) minus Yj (0) | Dj = 1 X j = x] is the group-specific covariate-conditioned average effect of treatment on the treated Here Yj (1) is the poten-tial response to treatment and Yj (0) is the potential response in the absence oftreatment

33 Exogeneity and effect identification with heterogeneity

With heterogeneity an identifying exogeneity condition relevant for analyzingGminuscausality is

Assumption A2 (Heterogeneous Cross-Section Exogeneity) For all g isin G(i) Dg | Xg sim D | X and (i i) Dg perp Ug | Xg

Observe that this imposes structures for all groups namely all g isin G Althoughthis condition is not the weakest possible (see Chalak and White (2012) and thediscussion below) A2 generally suffices to identify effects and link Gminuscausalityand structural causality A2(i) restricts the allowed distributional heterogeneityfor all g isin G the conditional distributions of Dg given Xg are assumed identicalA2(i i) imposes conditional exogeneity for all groups Below we discuss thisfurther

THEOREM 31 Suppose A1 - A2(i) hold If A2(i i) holds then D perp U | X

It is also of interest to know whether the converse holds A strict conversedoes not hold as certain fortuitous cancellations in population-group condi-tional distributions can yield sample conditional exogeneity without population-group conditional exogeneity Nevertheless the converse does hold under a mildregularity condition ruling out exceptional cases We introduce the followingdefinition

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

276 XUN LU ET AL

DEFINITION 34 Ug | Dg Xg g isin G is regular ifsumgisinG

pg fg(u | d x) fg(x) =sumgisinG

pg fg(u | x) fg(x) for all (ud x) (32)

implies Ug perp Dg | Xg for all g isin GWhen G = 1 Ug | Dg Xg g isin G is necessarily regular To understand whatregularity rules out consider the next simplest case with G = 2 and Xg absentThen Ug | Dg g isin G fails to be regular if and only if U2 perp D2 (say) and

f1(u | d) = f1(u)minus 1minus p1

p1[ f2(u | d)minus f2(u)] for all (ud)

where the two groups have been re-indexed for convenience Suppose that p1f1(u) f2(u) and f2(u | d) are arbitrary Then it is easily arranged that f1(u | d)can be negative for (ud) in a set of positive probability so f1(u | d) does notdefine a conditional density and Ug | Dg g isin G is regular after all If this f1(u |d) is nevertheless a conditional density it is clearly highly special as it ensuresthat eq(32) holds for the given p1 and the functions defined by f1(u) f2(u) andf2(u | d) (all of which are typically unknown) but not necessarily otherwise

In the general case Ug | Dg Xg g isin G fails to be regular if and only ifUg perp Dg | Xg for some g isin G and

f1(u | d x) = f1(u | x)minus (p1 f1(x))minus1sumg=2

pg [ fg(u | d x) minus fg(u | x)] fg(x)

times for all (ud x)

where we again re-index the groups As before this is clearly a very special pop-ulation configuration ruling out such cases by imposing regularity is a weak re-striction

The converse result is

THEOREM 32 Suppose A1 - A2(i) hold and suppose Ug | Dg Xgg isin Gis regular If D perp U | X then A2(i i) holds

Theorems 31-32 also hold with Yg | Dg Xg g isinG regular D perp U | X replacedby D perp Y | X and A2(i i) replaced by Dg perp Yg | Xg g isin G This result plays acentral role in linking structural causality and Gminuscausality

34 Linking structural causality and GminuscausalityAs we have seen A1 imposes structures where causal effects are well definedA2 permits recovering versions of these As we now prove this gives structuralmeaning to Gminuscausality generally Recall that the bj rsquos are identical in group Jgthus the same structural causality relations hold for all j in a given group Wewrite Dg rArrS Yg when Dj rArrS Yj for (all) j in group Jg Our next result showsthat structural noncausality implies Gminusnon-causality

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 277

PROPOSITION 33 Let A1 - A2 hold Then structural noncausality(Dg rArrS Yg forallg isinG) implies that D does not Gminuscause Y wrt X ie D perp Y | X

This result is intuitive when structural effects are well defined and identifiableGminusnon-causality follows from structural noncausality Holland (1986 p 958)gives a similar result for the random experiment case Our Proposition 33 jus-tifies using tests of Gminusnoncausality to test structural noncausality if we rejectGminusnoncausality we must reject structural noncausality

As WL show structural causality does not imply Gminuscausality Without furtherassumptions the concepts are not equivalent Nevertheless if as is commonlyassumed in the literature (see WL for discussion) r obeys separability betweenobservable causes D and unobservable causes U or r obeys a specific form ofmonotonicity in U then with suitable regularity structural causality does implyGminuscausality We have

THEOREM 34 Given A1 - A2 suppose Yg | Dg Xgg isin G is regularSuppose further that for all g isin G either (i) or (i i) holds

(i) For all j isin Jg for unknown measurable functions r1 and r2

Yj = r1(Dj Zj bj

)+ r2(Zj Uj bj

) (33)

(ii) For all j isinJg for = 1 ky r(d zu bj ) = r0(d zu bj ) for scalaru where r0(d z middot bj ) is strictly monotone increasing for each admissi-ble (d z) and Fj(y | d x) = P[Yj le y | Dj = d X j = x] is strictlymonotone in y for each admissible (d x)

Then structural causality (for some g isin G Dj rArrS Yj for (all) j in Jg) impliesthat D Gminuscauses Y wrt X ie D perp Y | X

Kasy (2011) gives a discussion of structures satisfying (i i) Even without sep-arability or monotonicity there is an equivalence between Gminuscausality and astronger notion of structural causality that handles certain exceptional cases wherethe causal structure and the conditional distribution of (Uj Dj ) given X j inter-act in just the right way to hide the structural causality WL call this strongernotion structural causality with positive probability and provide discussions Welet Yj = supp(Yj ) and Xj = supp(X j )

DEFINITION 35 (Structural Causality with Positive Probability) SupposeA1 holds and let j be given Suppose that for each y isin Yj there exists a mea-surable function f jy Xj rarr [01] such thatint

1r(Dj Zj u bj

)lt y

d Fj

(u | X j

) = f jy(X j

)as (34)

where Fj (u | X j ) = E[1Uj le u | X j ] Then Dj does not structurally cause Yj

almost surely (as) wrt X j (Dj rArrS(X j ) Yj ) Otherwise Dj structurally causesYj with positive probability (wpp) wrt X j

(Dj rArrS(X j ) Yj

)

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

278 XUN LU ET AL

By the definition of a group the same structural causality wpp relations holdfor all j in a given group We thus write Dg rArrS(Xg) Yg when Dj rArrS(X j ) Yj holdsfor all j in group Jg

Below is an example in WL in which Dj structurally causes Yj while Dj doesnot structurally cause Yj as Consider the structural equation r

Yj = r(Dj Uj

) = DjradicD2

j +1Uj1 + 1radic

D2j +1

Uj2

where Uj equiv (Uj1Uj2) and Dj are all N (01) random variables and Dj U1 j andU2 j are mutually independent For simplicity there is no bj Zj or X j It is clearthat Dj structurally causes Yj here Nevertheless Dj does not structurally causeYj as To see this note that Yj and Dj are independent Thus the LHS of eq(34)becomesint

1r(Dj u1u2

)lt y

d Fj (u1u2)

=int

1r(Dj u1u2

)lt y

d Fj

(u1u2|Dj

) = Pr[Yj lt y|Dj

] = (y)

where Fj (middot) and Fj(middot|Dj

)denote the CDF of Uj and the conditional CDF of Uj

given Dj and is the standard normal CDF Thus the RHS does not depend onDj ie Dj does not structurally cause Yj as

Theorem 35 below gives a result linking Gminuscausality and structural causalityas

THEOREM 35 Suppose A1 - A2 hold Suppose also that Yg | Dg Xgg isinG is regular If Dj structurally causes Yj wpp wrt X j (for some g isin GDj rArrS(X j ) Yj for all j in Jg) then D Gminuscauses Y wrt X ie D perp Y | X

Together Proposition 33 and Theorem 35 establish that cross-sectionGminuscausality and structural causality are essentially equivalent under the key con-ditional exogeneity assumption We say ldquoessentiallyrdquo as Theorem 35 rules outthe two ways that Gminusnon-causality can mask structural causality The first arisesfrom subtle interactions between the causal structure and the conditional distribu-tion of (Uj Dj ) given X j The second arises under heterogeneity from delicatecancellations among the conditional distributions of Yj given Dj and X j acrossgroups These exceptional possibilities only trivially mitigate the power of testsfor Gminuscausality as tests for structural causality Errors of interpretation and in-ference need not result as long as these exceptions are recognized

35 Identification and identification failure

To gain further insight into the relation between Gminuscausality and effect identi-fication we now undertake a deeper analysis of Gminuscausality in mean For this

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 279

let qg(d x) = fg(d x)sumhisinG (ph fh(d x)) Under A1 the law of iterated ex-

pectations gives

E (Y | D = d X = x) =sumgisinG

pg qg(d x)

intr(d zu bg) d Fg(u | d x)

For concreteness consider identifying average marginal effects for a continu-ous treatment d To allow both discrete and continuous Ug let νg(u | d x) definea σminusfinite measure dominating that defined by Fg(u | d x) and let fg(u | d x)define the associated Radon-Nikodym density for Ug given (Dg Xg) Assum-ing differentiability and mild regularity derivations parallel to those of White andChalak (2013) give

part

partdE(Y | D = d X = x)

=sumgisinG

pg qg(d x)

intpart

partdr(d zu bg) fg(u | d x) dνg(u | d x)

+sumgisinG

pg qg(d x)

int[r(d zu bg)

part

partdln fg(u | d x)]

times fg(u | d x) dνg(u | d x)

+sumgisinG

pgpart

partdqg(d x)

intr(d zu bg) fg(u | d x)dνg(u | d x)

After some manipulation this regression derivative can be written as

part

partdE (Y | D = d X = x) = β(d x)+ δ1(d x)+ δ2(d x) (35)

The first of the three terms on the right is a weighted average marginal effect

β(d x) =sumgisinG

pg qg(d x)βg(d x) where βg(d x)

= E[(partpartd)r(Dg ZgUg bg) | Dg = d Xg = x

]is the covariate-conditioned average marginal effect of Dj on Yj at (Dj = d X j =x) for j isin Jg g isin G Thus when the other two terms vanish the regressionderivative identifies the structurally meaningful weighted effect β(d x) andwhen β(d x) is nonzero for (d x) in a set of positive probability we haveGminuscausality in mean of D for Y wrt X When the other two terms do notvanish we generally have identification failure and there is no necessary linkbetween Gminuscausality and structural causality

The second term is a pure endogeneity bias

δ1(d x) =sumgisinG

pg qg(d x) E(εg Sg | Dg = d Xg = x

)

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

280 XUN LU ET AL

where εg = Ygminus E(Yg | Dg = d Xg = x

)is the regression residual for the given

group and Sg = (partpartd) ln fg(Ug | Dg Xg) is the exogeneity score of White andChalak (2013) Observe that E(εg Sg | Dg = d Xg = x) represents a specificform of omitted variable bias where the exogeneity score is the omitted variableA sufficient (but not necessary) condition for this bias to vanish is A2(i i) Dg perpUg | Xg for all g isin G

The third term is a pure heterogeneity bias

δ2(d x) =sumgisinG

pgpart

partdqg(d x) E

(Yg | Dg = d Xg = x

)

due to heterogeneity of fg(d | x) across members of G8 A sufficient (but notnecessary) condition for this to vanish is A2(i) Dg | Xg sim D | X for all g isin G

Thus A2 ensures that the regression derivative identifies the weighted effect

β(d x) = βlowast(d x) =sumgisinG

pg qlowastg(x)βlowast

g(d x)

where qlowastg(x) = fg(x)

sumh (ph fh(x)) and βlowast

g(d x) = int(partpartd)r(d zu bg)

d Fg(u | x) This identification provides the link between Gminuscausality in meanand structural causality

Although A2 is not necessary for effect identification cases where identifica-tion holds in the absence of A2 are quite special analogous to failures of reg-ularity Thus for practical purposes A2 can be viewed as playing the key rolein identifying effects of interest and thereby linking Gminuscausality and structuralcausality

Further as Proposition 61 and Corollary 62 of WL show in the absence ofstructural causality Gminuscausality is essentially equivalent to exogeneity failureHere this is reflected in the fact that when βg(d x) vanishes for all g A1 andmild regularity conditions give

part

partdE (Y | D = d X = x) = δ1(d x)+ δ2(d x)

Thus with structural noncausality as Gminuscausality in mean implies the fail-ure of A2(i) A2(i i) or both conversely failure of A2 essentially ensuresGminuscausality in mean

36 Testing for Gminuscausality and structural causality in cross-sectiondata

As just seen the hypothesis of Gminusnon-causality and thus of structural noncausal-ity under the conditional exogeneity assumption is a specific conditional inde-pendence In the literature there are many conditional independence tests thatapply to IID data (eg see Delgado and Gonzalez-Manteiga 2001 Fernandes and

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 281

Flores 2001 Su and White 2007 2008 2014 Song 2009 Linton and Gozalo2014 Huang Sun and White 2016) These methods can be computationallychallenging as they are nonparametric Angrist and Kuersteiner (2004 2011)develop a semi-parametric test for conditional independence Based on the factthat Y perp D | X implies ψ1(Y ) perp ψ2(D) | X for any measurable vector functions(ψ1ψ2) WL propose several convenient regression-based tests

4 STRUCTURAL CAUSALITY AND GminusCAUSALITY IN PANEL DATAThe literature contains considerable discussion about Gminuscausality in the paneldata setting Nevertheless the focus is mainly on linear or parametric modelsFor example Chamberlain (1984) discusses Gminuscausality ldquoconditional on un-observables [lsquofixed effectrsquo]rdquo in a linear model and a logit model Holtz-EakinNewey and Rosen (1988) consider a traditional linear vector autoregression(VAR) and use GMM to test Gminuscausality Nair-Reichert and Weinhold (2001)discuss Gminuscausality in a dynamic mixed fixed- and random-coefficients linearmodel allowing heterogeneity across individuals Dumitrescu and Hurlin (2012)test for Gminuscausality in a linear dynamic panel model with fixed coefficients thatvary across individuals There is also a growing literature on nonlinear and non-separable panel models (see eg Hoderlein and White (2012) and the referencestherein) We are not aware that Gminuscausality has been discussed in such modelsso far

We emphasize that in panel data Gminuscausality is also entirely based on pre-dictability and has no causal interpretation In this section we link Gminuscausalityand structural causality in general panel structures Since the static panel case issimilar to the cross-sectional case we focus on a dynamic data generating processTo simplify notation for given y isin N+ we let Y jtminus1 denote the finite historyY jtminus1 = (Yjtminusy Yjtminus1)

Assumption B1 (Panel Population DGP) Let (F P) be a complete prob-ability space let N isin N+ and let y isin N+ For the members of a populationj isin 1 N let the random vectors Yjt t = 12 be structurally determinedby a triangular system as

Yjt = r(Y jtminus1 Djt Zjt Ujt bjt

) (41)

where r is an unknown measurable ky times 1 function ky isin N+ Yjτ (τ = 1 minusy 0) Djt (kd times1kd isin N+) Zjt (kz times1 kz isin N) and Ujt (ku times1 ku isin N)are vectors of nondegenerate random variables on (F P) and bjt is a non-random real vector of dimension kb isin N Suppose also that Wjt t = 12 arerandom vectors on (F P) with dimension kw isin N The triangular structureis such that for t = 12 neither Yjsinfins=1minusy

nor Djsinfins=1 structurally deter-mine Wjt Yjsinfins=1minusy

does not structurally determine Djt Zjt or Ujt andDjsinfins=1 does not structurally determine Zjt or Ujt

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

282 XUN LU ET AL

The interpretation of structural eq(41) is the same as that for eq(31) ex-cept that here we have multiple time periods t = 12 The elements of(Yjt Djt Zjt Wjt Ujt

)and bjt rsquos can contain both time-varying and time-

invariant elements For now we do not distinguish these The elements of(Djt Zjt Wjt Ujt ) can contain lags of underlying variables By permittingonly finite histories Y jtminus1 Djt Zjt we restrict attention to Markov-type datagenerating processes this greatly simplifies the analysis and corresponds to thestructures usually considered in practice

As in the cross-section case we can have both structural heterogeneity anddistributional heterogeneity We specify the relevant structural and distributionalheterogeneities below

We again impose random sampling from the population For simplicity weassume all time periods can be observed for every individual so we observe abalanced panel Similar to the cross-section case we write

Yt = r (Y tminus1 Dt Zt Ut Bt ) t = 1 T

by suppressing the cross-sectional subscript i We assume that we observe data(Yt Dt Zt Wt ) t = 1 T and relevant lags prior to t = 1

41 Linking structural causality and Gminuscausality in dynamic panelsWe first define structural causality for the dynamic panel structure Let Djt be thesupport of Djt

DEFINITION 41 (Structural Causality in Dynamic Panels) Let j and t begiven If the function r(y jtminus1 middot zjt ujt bjt ) Djt rarr R

ky is constant on Djt

for all admissible y jtminus1 zjt and ujt then Djt does not structurally cause Yjt and we write Djt rArrS Yjt Otherwise Djt structurally causes Yjt and we writeDjt rArrS Yjt

We let X jt = (Z primejt W prime

jt )prime denote the covariates at time t WL consider

retrospective conditional exogeneity and retrospective Gminuscausality (White andKennedy 2009) We also allow this For this we letX jt denote a covariate his-tory that may contain X jt and lags or leads of X jt as in Wooldridge (2005 p 41)or WL ie for some m isin N+

X jt = SX(X jtminusm X jt X jt+m

) t = 12

where SX is a given selection matrix We assume

X jt X jtminusm subeX jt and

allow X jt also to contain leads of X jt When Xjt contains leads we have theretrospective case as in WL We denote randomly sampled covariates X t Forsimplicity we also assume that all needed elements of X t are observable Tem-poral precedence plays an important role in panel data In particular comparedwith the cross-sectional case we now have a richer set of covariates (includingboth leads and lags) to choose from

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 283

To provide a link between structural causality and Gminuscausality for dynamicpanels we condition on not only covariates X t but also lags of Yt For thiswe denote Y jtminus1 = (Yjtminusmy Yjtminus1) where my isin N+ and my ge y ie

Y jtminus1 sube Y jtminus1 Similarly we let Y tminus1 be the randomly sampled Y jtminus1Similar to the cross-section case for each t we define population group Jgt

g isin 1 t as a collection of population units having identical bjt and iden-tical distribution of (Ujt Djt ) |X jt Y jtminus1 We also let Gt = 1 t Forgiven t we write Dgt rArrS Ygt to denote Djt rArrS Yjt for all j in group Jgt g isin Gt

Next we impose an exogeneity assumption analogous to A2

Assumption B2 (Heterogeneous Dynamic Panel Exogeneity) Let t be givenFor all g isin Gt (i) Dgt |Xgt Y gtminus1 sim Dt |X t Y tminus1 and (i i) Dgt perp Ugt |Xgt Y gtminus1

Here as before for all g isin Gt the conditional distributions of Dgt given(Xgt Y gtminus1) are assumed to be identical and we write the common distributionas Dt |X t Y tminus1

The analog of Theorem 31 relating population and sample conditional exo-geneity is

THEOREM 41 Suppose B1ndashB2(i) hold and let t be given If B2(i i) holdsthen Dt perp Ut |X t Y tminus1

Proposition 42 provides a link between structural noncausality and Gminusnon-causality

PROPOSITION 42 Suppose that B1 and B2 hold Let t be given Thenstructural noncausality (Dgt rArrS Ygt forallg isin Gt ) implies that Dt does notGminuscause Yt wrtX t Y tminus1 ie Dt perp Yt |X t Y tminus1

We can also define structural causality with positive probability for dynamicstructures and show that structural causality and Gminuscausality are ldquoessentiallyrdquoequivalent under the key conditional exogeneity assumption For brevity we omitthe details

So far we have not carefully distinguished between time-varying and time-invariant elements of unobservable bjt and Ujt Below we briefly discuss thecase where we allow time-invariant components of Ujt and the randomly sampledbjt to be arbitrarily dependent or correlated with the cause of interest Djt For this we denote Ujt equiv (Ujt Uj0) where Ujt is time varying and Uj0time-invariant Similarly we denote bjt equiv (bjt bj0) The randomly sampledUjt Uj0 bjt bj0 are denoted as Ut U0 Bt B0 respectively In princi-ple we want to remove the time-invariant Uj0 and bj0 using eg the firstdifference For this it is convenient to impose a separable assumption on thestructural equation9 We also impose a conditional exogeneity assumption at thesample level

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

284 XUN LU ET AL

Assumption B3 Suppose that B1 holds with

Yjt = r1(Djt Uj0 bj0

)+r2(Y jtminus1 Djt Zjt Ujt bjt

) j = 1 N t = 1

where r1 and r2 are two unknown ky times1 measurable functions

Assumption B4 Suppose that for given t Dt perp (Ut Utminus1 Bt Btminus1

) |Xt Y tminus1

B4 is a sample-level assumption which can be supported by a correspond-ing assumption at the heterogeneous population level Note that in general B4is a strong assumption as we condition on Y tminus1 which is a function of bothDtminus1 and Utminus1 Nevertheless this assumption is plausible under the null of struc-tural noncausality as Y jtminus1 is a constant function of Djtminus1 under the nullOne simple example is that bjt rsquos are constants over j and t and Djt infint=1 perpYjt 0

t=1minusyUjt infint=1X jt infint=1 Under the null of structural noncausality for

all t Y jtminus1 is a function of Yjs0s=1minusy

Ujsts=1X jst

s=1 Then it is easyto show that in this case B4 is satisfied Certainly it will be interesting to relaxB4 but we leave this for future research

PROPOSITION 43 Suppose that B1ndashB4 hold Suppose that Dgt rArrS Ygtand Dgtminus1 rArrS Ygtminus1 forallg isin Gt cupGtminus1 Then Dt perp Yt | Xt Y tminus1

Proposition 43 suggests that when there are time-invariant components in theunobservables we can test for structural noncausality by testing for Gminusnon-causality Dt perp Yt | Xt Y tminus1 under the conditional exogeneity assumption B4

42 Testing for Gminuscausality and structural causality in panel dataAs shown above we can perform a Gminusnon causality test to test for structuralnoncausality Testing Gminusnon-causality in panel data is also simply a conditionalindependence test Here we focus on testing Dt perp Yt | Xt Y tminus1

First assume that the joint distributions of (Dt Yt Xt ) are identical over t In this case we can pool the data and implement a conditional independence testas discussed in Section 36

Second suppose that the joint distributions of (Dt Yt Xt ) are different overtime t In this case for each time period t using the cross-section data we canimplement a conditional independence test for Dt perp Yt | (Xt Y tminus1) and obtain atest statistic sayWt Thus we have T test statistics W1W2 WT and eachone can be used to test for structural noncausality for each t We may also want totest the hypothesis that for all t Djt does not structurally cause Yjt For this wecan construct our test statistic by taking the average of the T test statistics ieW equiv 1

T

sumTt=1Wt as in Im Pesaran and Shin (2003) and Dumitrescu and Hurlin

(2012) Other forms of ldquoaveragesrdquo are possible and special care is needed to takeinto account the dependence structure over time

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 285

5 CONCLUSION

This paper provides direct links between Granger causality and structural causal-ity in cross-section and panel data We extend Granger causality to cross-section and panel data and give a natural definition of structural causality inheterogeneous populations We show that under the key conditional exogeneityassumption Granger causality is essentially equivalent to structural causality incross-section and panel data Similar to the results in White and Lu (2010) fortime-series data our results here should enable researchers to avoid the misuse ofGranger causality and to establish the desired structural causal relation

NOTES

1 Strictly speaking if the conditional distributions are parameterized or treated as infinite-dimensional parameters then distributional heterogeneity may also be thought of as a special caseof structural heterogeneity

2 The classification is certainly overly simplistic We ignore the early literature on the philosophyof causality (for a review see eg Holland 1986 and Hoover 2008) We also ignore the litera-ture on causality discussed in other disciplines notably in machine learning (see eg Pearl 20092015 White and Chalak 2009 and White Chalak and Lu 2011) In particular Heckman and Pinto(2015) compare Haavelmorsquos structural framework of causality with the framework of Directed AcyclicGraphs (DAG) studied in the Bayesian network

3 Although Uj is typically called ldquounobservablerdquo it is better to view its elements as variables thatwill be omitted from empirical analysis This may be because they are unobservable it may also bebecause the researcher has purposefully or inadvertantly neglected them

4 For notational simplicity we suppress the observation subscript i5 Using our notation Holland considers the case where D perp (XU ) which implies our condi-

tional exogeneity assumption D perp U | X below6 See eg Altonji and Matzkin (2005) Hoderlein and Mammen (2007 2009) Imbens and Newey

(2009) White and Lu (2011) and White and Chalak (2013)7 AT T = E(Y (1)minusY (0) | D = 1) = E(U minus0 | D = 1) = E(U ) = 0

8 The conditional density fg(d | x) can also be interpreted as Imbensrsquos (2000) generalized propen-sity score

9 There is a recent literature which allows time-invariant unobservables to be arbitrarily correlatedwith causes of interest in the general nonseparable models (see eg Evdokimov 2010) We leave theproblem of linking G-causality and structural causality in this general setting for future research

REFERENCES

Altonji J amp R Matzkin (2005) Cross section and panel data estimators for nonseparable models withendogenous regressors Econometrica 73 1053ndash1102

Angrist J K Graddy amp G Imbens (2000) The interpretation of instrumental variables estimatorsin simultaneous equations models with an application to the demand for fish Review of EconomicStudies 67 499ndash527

Angrist J O Jorda amp G Kuersteiner (2013) Semiparametric estimates of monetary policy effectsstring theory revisited NBER Working Paper 19355

Angrist J amp G Kuersteiner (2004) Semiparametric causality tests using the policy propensity scoreNBER Working Paper No 10975

Angrist J amp G Kuersteiner (2011) Causal effects of monetary shocks semiparametric conditionalindependence tests with a multinomial propensity score Review of Economics and Statistics 93725ndash747

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

286 XUN LU ET AL

Angrist J amp J Pischke (2009) Mostly Harmless Econometrics An Empiricistrsquos Companion PrincetonUniversity Press

Bester CA amp CB Hansen (2016) Grouped effects estimators in fixed effects models Journal ofEconometrics 190 197ndash208

Bonhomme S amp E Manresa (2015) Grouped patterns of heterogeneity in panel data Econometrica83 1147ndash1184

Browning M amp JM Carro (2007) Heterogeneity and microeconometrics modelling In R BlundellW K Newey and T Persson (Eds) Advances in Economics and Econometrics Theory andApplications Ninth World Congress of the Econometric Society Volume 3 pp 45ndash74 CambridgeUniversity Press

Browning M amp JM Carro (2010) Heterogeneity in dynamic discrete choice models EconometricsJournal 13 1ndash39

Burda M M Harding amp J Hausman (2015) A Bayesian semi-parametric competing risk model withunobserved heterogeneity Journal of Applied Econometrics 30 353ndash376

Chalak K amp H White (2011) An extended class of instrumental variables for the estimation of causaleffects Canadian Journal of Economics 44 1ndash51

Chalak K amp H White (2012) Causality conditional independence and graphical separation insettable systems Neural Computation 24 1611ndash1668

Chamberlain G (1982) The general equivalence of Granger and Sims causality Econometrica 50569ndash581

Chamberlain G (1984) Panel data In Z Griliches amp M Intriligator (eds) Handbook of Economet-rics vol2 ch 22 pp1247ndash1318 Elsevier

Chesher A (2003) Identification in nonseparable models Econometrica 71 1405ndash1441Chesher A (2005) Nonparametric identification under discrete variation Econometrica 73

1525ndash1550Dawid AP (1979) Conditional independence in statistical theory Journal of the Royal Statistical

Society Series B 41 1ndash31Deb P amp PK Trivedi (2013) Finite mixture for panels with fixed effects Journal of Econometric

Methods 2 35ndash51Delgado M amp W Gonzalez-Manteiga (2001) Significance testing in nonparametric regression based

on the bootstrap Annals of Statistics 29 1469ndash1507Dumitrescu E-I amp C Hurlin (2012) Testing for Granger non-causality in heterogeneous panels

Economic Modelling 29 1450ndash1460Evdokimov K (2010) Identification and estimation of a nonparametric panel data model with unob-

served heterogeneity Working paper Department of Economics Princeton UniversityFlorens JP (2003) Some technical issues in defining causality Journal of Econometrics 112

127ndash128Florens JP amp D Fougere (1996) Non-causality in continuous time Econometrica 64 1195ndash1212Florens JP amp M Mouchart (1982) A note on non-causality Econometrica 50 583ndash591Granger CWJ (1969) Investigating causal relations by econometric models and cross-spectral

methods Econometrica 37 424ndash438Granger CWJ (1980) Testing for causality a personal viewpoint Journal of Economic Dynamics

and Control 2 329ndash352Granger CWJ (1988) Some recent developments in a concept of causality Journal of Econometrics

39 199ndash211Granger CWJ amp P Newbold (1986) Forecasting Economic Time Series Academic Press 2nd

editionHausman J amp T Woutersen (2014) Estimating a semi-parametric duration model without specifying

heterogeneity Journal of Econometrics 178 114ndash131Haavelmo T (1943) The statistical implications of a system of simultaneous equations Econometrica

11 1ndash12Haavelmo T (1944) The probability approach in econometrics Econometrica 12 (supp) iii-vi

1ndash115

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 287

Hahn J (1998) On the role of the propensity score in efficient semiparametric estimation of averagetreatment effect Econometrica 66 315ndash331

Hahn J amp R Moon (2010) Panel data models with finite number of multiple equilibria EconometricTheory 26 863ndash881

Heckman JJ (2000) Causal parameters and policy analysis in economics a twentieth century retro-spective Quarterly Journal of Economics 115 45ndash97

Heckman JJ (2008) Econometric causality International Statistical Review 76 1ndash27Heckman JJ amp R Pinto (2015) Causal analysis after Haavelmo Econometric Theory 31

115ndash151Hirano K G Imbens amp G Ridder (2003) Efficient estimation of average treatment effects using the

estimated propensity score Econometrica 71 1161ndash1189Hoderlein S amp E Mammen (2007) Identification of marginal effects in nonseparable models without

monotonicity Econometrica 75 1513ndash1519Hoderlein S amp E Mammen (2009) Identification and estimation of local average derivatives in non-

separable models without monotonicity Econometrics Journal 12 1ndash25Hoderlein S amp H White (2012) Nonparametric identification in nonseparable panel data models with

generalized fixed effects Journal of Econometrics 168 300ndash314Holland P (1986) Statistics and causal inference Journal of the American Statistical Association 81

945ndash970Holtz-Eakin D W Newey amp HS Rosen (1988) Estimating vector autoregressions with panel data

Econometrica 56 1371ndash1396Hoover K (2008) Causality in economics and econometrics In S Durlauf and L Blume (eds)

The New Palgrave Dictionary of Economics 2nd edition Palgrave MacmillanHsiao C (2014) Analysis of Panel Data Cambridge University PressHuang M Y Sun amp H White (2016) A flexible nonparametric test for conditional independence

Econometric Theory forthcomingHurwicz L (1950) Generalization of the concept of identification In TC Koopmans (eds) Statistical

Inference in Dynamic Economic Models pp 238ndash257 John WileyIm K H Pesaran amp Y Shin (2003) Testing for unit roots in heterogeneous panels Journal of Econo-

metrics 115 53ndash74Imbens G (2000) The role of the propensity score in estimating dose-response functions Biometrika

87 706ndash710Imbens G (2004) Nonparametric estimation of average treatment effects under exogeneity a review

Review of Economics and Statistics 86 4ndash29Imbens G amp W Newey (2009) Identification and estimation of triangular simultaneous equations

models without additivity Econometrica 77 1481ndash1512Imbens G amp J Wooldridge (2009) Recent developments in the econometrics of program evaluation

Journal of Economic Literature 47 5ndash86Kasy M (2011) Identification in triangular systems using control functions Econometric Theory 27

663ndash671Koopmans T (1950) Statistical Inference in Dynamic Economic Models Cowles Commission Mono-

graph No 10 WileyKuersteiner G (2008) GrangerndashSims causality In S Durlauf and L Blume (Eds) The New Palgrave

Dictionary of Economics 2nd edition Palgrave MacmillanLechner M (2001) Identification and estimation of causal effect of multiple treatments under con-

ditional independence assumption In M Lechner F Pfeiller (eds) Econometric Evaluations ofLabour Market Policies Hildelberg Physical 43ndash58

Lechner M (2011) The relation of different concepts of causality used in time series and microecono-metrics Econometric Reviews 30 109ndash127

Lin C-C amp S Ng (2012) Estimation of panel data models with parameter heterogeneity when groupmembership is unknown Journal of Econometric Methods 1 42ndash55

Lindsay B (1995) Mixture Models Theory Geometry and Applications Hayward Institute for Math-ematical Statistics

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

288 XUN LU ET AL

Linton O amp P Gozalo (2014) Testing conditional independence restrictions Econometric Reviews33 523ndash552

Lu X amp L Su (2014) Determining the number of groups in latent panel structures Working paperSchool of Economics Singapore Management University

Matzkin R (2003) Nonparametric estimation of nonadditive random functions Econometrica 711339ndash1375

Matzkin R (2007) Nonparametric identification In JJ Heckman and EE Leamer (eds) The Hand-book of Econometrics 6B North-Holland

Mclachlan G amp K Basford (1988) Mixture Models Inference and Applications to Clustering MarcelDekker

Nair-Reichert U amp D Weinhold (2001) Causality tests for cross-country panels a new look at FDIand economic growth in developing countries Oxford Bulletin of Economics and Statistics 63153-171

Pearl J (2009) Causality Models Reasoning and Inference (2nd edition) Cambridge UniversityPress

Pearl J (2015) Trygve Haavelmo and the emergence of causal calculus Econometric Theory 31152ndash179

Rosenbaum P amp D Rubin (1983) The central role of the propensity score in observational studies forcausal effects Biometrika 70 41ndash55

Rubin D (1974) Estimating causal effects of treatments in randomized and non-randomized studiesJournal of Educational Psychology 66 688ndash701

Rubin D (2004) Direct and indirect causal effects via potential outcomes Scandinavian Journal ofStatistics 31 161ndash170

Sarafidis V amp N Weber (2015) A partially heterogeneous framework for analyzing panel data OxfordBulletin of Economics and Statistics 77 274ndash296

Sims C (1972) Money income and causality American Economic Review 62 540ndash552Song K (2009) Testing conditional independence via Rosenblatt transforms Annals of Statistics 37

4011ndash4015Strotz R amp H Wold (1960) Recursive vs nonrecursive systems An attempt at synthesis Economet-

rica 28 417ndash427Su L Z Shi amp PCB Phillips (2014) Identifying latent structures in panel data Working paper

Department of Economics Yale UniversitySu L amp H White (2007) A Consistent characteristic function-based test for conditional indepen-

dence Journal of Econometrics 141 807ndash834Su L amp H White (2008) A nonparametric Hellinger metric test for conditional independence Econo-

metric Theory 24 829ndash864Su L amp H White (2014) Testing conditional independence via empirical likelihood Journal of

Econometrics 182 27ndash44Sun Y (2005) Estimation and inference in panel structure models Working paper Department of

Economics UCSDWhite H amp K Chalak (2009) Settable systems An extension of Pearlrsquos causal model with optimiza-

tion equilibrium and learning Journal of Machine Learning Research 10 1759ndash1799White H amp K Chalak (2010) Testing a conditional form of exogeneity Economics Letters 109

88ndash90White H amp K Chalak (2013) Identification and identification failure for treatment effects using

structural systems Econometric Reviews 32 273ndash317White H K Chalak amp X Lu (2011) Linking Granger causality and the Pearl causal model with

settable systems Journal of Machine Learning Research Workshop and Conference Proceedings12 1ndash29

White H amp P Kennedy (2009) Retrospective estimation of causal effects through time In J Castleamp N Shephard (eds) The Methodology and Practice of Econometrics A Festschrift in Honour ofDavid F Hendry pp 59ndash87 Oxford University Press

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 289

White H amp X Lu (2010) Granger causality and dynamic structural systems Journal of FinancialEconometrics 8 193ndash243

White H amp X Lu (2011) Causal diagrams for treatment effect estimation with application to selectionof efficient covariates Review of Economics and Statistics 93 1453ndash1459

Wooldridge J (2005) Simple solutions to the initial conditions problem for dynamic nonlinear paneldata models with unobserved heterogeneity Journal of Applied Econometrics 20 39ndash54

Zellner A (1979) Causality and econometrics In K Brunner and AH Meltzer (eds) Three Aspectsof Policy and Policymaking Carnegie-Rochester Conference Series Vol 10 North-Holland 9ndash54

APPENDIX

Proof of Theorem 31 To show D perp U | X we establish that f (u | d x) = f (u | x)for all (ud x) We have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

= f (u x)

f (x)= f (u | x)

where the first equality in the second line holds since A2(i) (ie Dg | Xg sim D | X)implies fg(d x) = f (d | x) fg(x) say and the second holds by A2(i i) fg(u | d x) =fg(u | x) n

Proof of Theorem 32 Note that

f (u | x) = f (u x)

f (x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

We also have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | d x) fg(x)sum

gisinG pg fg(x)

The first equality in the second line follows from A2(i) Then f (u | d x) = f (u | x) forall (ud x) if and only ifsumgisinG

pg fg(u | d x) fg(x) =sumgisinG

pg fg(u | x) fg(x) for all (ud x)

By assumption (Ug | Dg Xg) g isinG is regular It follows immediately that for all g isinGDg perp Ug | Xg n

Proof of Proposition 33 Take any g isin G and let j belong to group Jg Dg rArrSYg means that Yj = r(Zj Uj bj ) Thus Uj perp Dj | X j implies that Yj perp Dj | X j byDawid (1979 Lemmas 41 and 42) This holds for all g isin G Applying Theorem 31gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

290 XUN LU ET AL

Proof of Theorem 34 We prove this by showing that Y perp D | X implies that Dg rArrSYg for all g isinG Given the assumed regularity Theorem 32 ensures that Y perp D | X impliesthat Yg perp Dg | Xg for all g isin G

Given (i) eq(33) and A2(ii) imply that for all j in group JgE

(Yj |X j = x Dj = d

) = r1(d z bj

)+ E[r2

(zUj bj

) | Dj = d X j = x]

= r1(d z bj

)+ E[r2

(zUj bj

) | X j = x]

Clearly r1(d z bj

)is constant in d if and only if E

(Yj |Dj = d X j = x

)is constant in

d Yj perp Dj | X j implies that E(Yj |Dj = d X j = x

)is constant in d which then implies

that r1(d z bj

)is constant in d Thus Dj rArrS Yj so Dg rArrS Yg

Given (i i) let ky = 1 without loss of generality Then by A2(ii) for all j in group JgFUj (u | x) = E

[1Uj le u

∣∣ X j = x]

= E[

1r(d zUj bj

) le r(d zu bj

)∣∣ X j = x]

= E[

1r(Dj Zj Uj bj

) le r(d zu bj

)∣∣ Dj = d X j = x]

= E[

1Yj le r

(d zu bj

)∣∣ Dj = d X j = x]

= Fj(r(d zu bj

) | d x)

By strict monotonicity of Fj (y | d x) in y

r(d zu bj

) = Fminus1j

(FUj (u | x) | d x

)

Now Yj perp Dj | X j implies that Fminus1j

(FUj (u | x) | d x

)is constant in d so r

(d zu bj

)is

also a constant function in d ie Dj rArrS Yj so Dg rArrS Yg As either (i) or (i i) holds for each g isin G we have Dg rArrS Yg for all g isin G n

Proof of Theorem 35 To show that A1 - A2 and the assumed regularity for Ygtogether with Dg rArrS(Xg) Yg for some g isin G imply D perp Y | X we let j in group Jgg isin G be such that Dj rArrS(X j ) Yj A2(i i) ensures that Dj perp Uj | X j so

P[Yj le y | Dj = d X j = x] =int

1r(d zu bj ) le y

d Fj (u | d x)

=int

1r(d zu bj ) le y

d Fj (u | x)

By assumption Dj rArrS(X j ) Yj so there exists ylowast isin supp(Yj ) such that there is no mea-

surable mapping f jylowast for whichint

1r(d zu bj ) le ylowastd Fj (u | d x) = f jylowast(x) ae-x Specifically this rules out the possibility thatint

1r(d zu bj ) le ylowast

d Fj (u | X j ) = f jylowast(X j ) equiv P[Yj le ylowast | X j ] as

Thus there exists ylowast isin supp(Yj ) such that

P[Yj le ylowast | Dj = d X j = x

] = P[Yj le ylowast | X j = x

]

for x in a set of positive probability so Yj perp Dj | X j Given the regularity assumed for YgTheorem 32 (for Yg) gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 291

Proof of Theorem 41 The proof is analogous to that of Theorem 31 n

Proof of Proposition 42 The proof is analogous to that of Proposition 33 n

Proof of Proposition 43 Under B3 Dgt rArrS Ygt and Dgtminus1 rArrS Ygtminus1 forallg isinGt cupGtminus1 imply that for j isin Jg there exist two measurable functions r1 and r2 such that

Yjt = r1(Uj0 bj0)+ r2(Y jtminus1 Zjt Ujt bjt

) j = 1 N t = 1

Hence

Yjt minusYjtminus1 = r2(Y jtminus1 Zjt Ujt bjt

)minus r2(Y jtminus2 Zjtminus1Ujtminus1 bjtminus1

)

At the sample level this means that

Yt minusYtminus1 = r2(Y tminus1 Zt Ut Bt

)minus r2(Y tminus2 Ztminus1Utminus1 Btminus1

)

Then B4 implies that

Dt perp (Xt Y tminus1Ut Utminus1 Bt Btminus1) | Xt Y tminus1

which further implies that Dt perp (Yt minusYtminus1) | Xt Y tminus1by Dawid (1979 Lemmas 41 and42) as (Yt minus Ytminus1) is a function of (Xt Y tminus1Ut Utminus1 Bt Btminus1) Then we have Dt perpYt | Xt Y tminus1 as Ytminus1 is a component of Y tminus1 n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

Page 6: 33 GRANGER CAUSALITY AND STRUCTURAL CAUSALITY IN … Lu_Su... · 2017. 2. 7. · 264 XUNLUETAL. natural experiments. The goal of this paper is to establish the analogous equiv-alence

268 XUN LU ET AL

of central interest to economists and econometricians optimization equilibriumand learning Roughly speaking a settable system is ldquoa mathematical frameworkdescribing an environment in which multiple agents interact under uncertaintyrdquo(White and Chalak 2009 p 1760) In the settable system a variable of interesthas two roles ldquoresponsesrdquo and ldquosettingsrdquo When the value of the variable is deter-mined by the structural equation these values are called ldquoresponsesrdquo In contrastwhen the value is not determined by the structural equation but is instead set toone of its admissible values these values are called ldquosettingsrdquo They show thaton the settable system causes and effects can be rigorously defined Chalak andWhite (2012) provide definitions of direct indirect and total causality on the set-table system in terms of functional dependence and show how causal relationsand conditional independence are connected Chalak and White (2011) providean exhaustive characterization of potentially identifying conditional exogeneityrelationships in liner structural equations and introduce conditioning and condi-tional extended instrumental variables to identify causal effects White and Chalak(2013) provide a detailed discussion on the identification and identification failureof causal effects in structural equations White and Chalak (2010) discuss how totest the conditional exogeneity assumption

The treatment effect literature adopts the potential outcome framework (seeeg Rubin 1974 Rosenbaum and Rubin 1983 and Holland 1986) For a causeof interest Di and response of interest Yi we define a collection of potential out-comes Yi (d) d isin D where D is the support of Di Then we can define causalor treatment effect based on the potential outcomes Specifically for two valuesd and dlowast in D we define Yi (dlowast)minus Yi (d) as the causal effects of Di on Yi whenDi changes from d to dlowast If Yi (d) = Yi (dlowast) for all d and dlowast isin D we say thatDi has no causal effects on Yi For example when Di is binary ie D = 01Yi (0) and Yi (1) are the potential outcomes corresponding Di = 0 and Di = 1respectively The treatment or causal effect is Yi (1)minus Yi (0) Two average effectsoften discussed in the literature are

the average treatment effect (ATE) E(Yi (1)minusYi (0))

and

the average treatment effect on the treated (ATT) E(Yi (1)minusYi (0) | Di = 1)

Note that we only observe one outcome in data For example when Di is binarythe observed outcome is simply Yi = Di Yi (1) + (1 minus Di )Yi (0) To identify theeffects we often impose the assumption of unconfoundedness or selection onobservables

Yi (d) perp Di | Xi

where Xi is observable covariates Lechner (2001) Imbens (2004) Angrist andPischke (2009) and Imbens and Wooldridge (2009) provide reviews on identi-fication and estimation of various effects in the potential outcome framework

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 269

White and Chalak (2013 p 280) show that the structural equation framework isequivalent to the potential outcome framework For example the potential out-come Yi (d) is simply r(dUi ) The unconfoundedness assumption Yi (d) perp Di |Xi in the potential outcome framework is equivalent to the conditional exogeneityassumption Ui perp Di | Xi in the structural equation framework

23 Relationship between these two types of causality

We emphasize that Gminuscausality and Sims causality are entirely based on pre-dictability while the causality defined on structural equations or potential out-comes is a real causal relation In the literature there are several papers thatlink these two types of causality White and Lu (2011) provide a direct linkbetween Gminuscausality and structural causality in structural equations They showthat given the conditional exogeneity assumption structural noncausality isessentially equivalent to Gminusnoncausality Angrist and Kuersteiner (2004 2011)and Kuersteiner (2008) discuss the link between Sims noncausality and thenoncausality defined in the potential outcome framework under the assumption ofselection on observables Angrist Jorda and Kuersteiner (2013) study the mon-etary policy effects in the potential outcome framework Lechner (2011) linksGrangerSims noncausality to various average effects defined in the potentialoutcome framework However all the discussions so far have been made in thecontext of time series data

3 STRUCTURAL CAUSALITY AND GminusCAUSALITY INCROSS-SECTION DATA

31 Structural causality

We first specify a population DGP for a single time period omitting the timeindex We write N+ =12 and N = 0cupN+ We also write N+ = N+ cupinfin and N = 0cup N+

Assumption A1 (Cross-Section Heterogeneous Population DGP) Let(F P) be a complete probability space let N isin N+ and for the membersof a population j isin 1 N let the random vector Yj be structurally determinedby a triangular system as

Yj = r(Dj Zj Uj bj

) (31)

where r is an unknown measurable ky times 1 function ky isin N+ Dj Zj and Uj j = 12 are vectors of nondegenerate random variables on (F P) havingdimensions kd isin N+ kz isin N and ku isin N respectively and bj is a nonrandomreal vector of dimension kb isin N Suppose also that Wj is a random vector on(F P) with dimension kw isin N The triangular structure is such that Yj doesnot structurally determine Dj Zj and Uj and neither Yj nor Dj structurallydetermines Wj

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

270 XUN LU ET AL

To keep causal concepts clear it is important to distinguish between the popu-lation DGP defined above and the sample DGP defined later For now we leavesampling aside We may also refer to members of the population as ldquounitsrdquo orldquoindividualsrdquo and the distribution of the population as a mixture distributionMixture distributions appear in many contexts in the literature and arise natu-rally where a statistical population contains two or more subpopulations (see egLindsay 1995 and Mclachlan and Basford 1988) As will be clear in a momentthe sources of population heterogeneity are in general not observed

We interpret Yj as the response of interest It is determined by variables Dj Zj and Uj and the constants bj The variables can be binary categorical orcontinuous Immediately below we formalize a natural notion of causality for thissystem For now it is heuristically appropriate to view Dj as observable causes ofinterest (eg a treatment) Zj as other observable causes not of primary interestand Uj as unobservable causes3

The structural function r is unknown and our goal will be to learn about itfrom a sample of the population Nevertheless when Wj has positive dimensionr embodies the a priori exclusion restriction that Wj does not determine Yj Thetypical sources of this restriction as well as the identities of Dj Zj and Uj theirstatus as observable or not and the priority or precedence relations embodied inthe assumed triangularity are economic theory and specific domain knowledge

The assumed triangularity rules out explicit simultaneity for succinctness andclarity For the causality in simultaneous structural equations there are differentviews in the literature White and Chalak (2009 2013) follow Strotz and Wold(1960) and argue that simultaneous equations are not causal structural relationsbut are instead ldquomutual consistency conditions holding between distinct sets ofstructural equations ndash for example between one set of structural equations govern-ing partial equilibrium and another governing full equilibriumrdquo However otherresearchers argue that simultaneous structural equations can be given a causalinterpretation For example Angrist Graddy and Imbens (2000) provide a poten-tial outcome interpretation for general nonlinear simultaneous equation modelsThey also show that the standard linear instrumental variable estimator identifiesa weighted average of the causal effects

By definition the constants bj are fixed for a given individual Neverthelessthey may vary across individuals we call this variation structural heterogeneityIf the bj rsquos are identical we write them as b0 We assume that bj rsquos are unknownNote that the presence of bj facilitates writing r without a j subscript as differ-ences in the structural relations across population members can be accommodatedby variations in the possibly infinite-dimensional bj

To give a definition of causality for the structural system in Assumption A1 letDj = supp(Dj ) denote the support of Dj ie the smallest closed set containingDj with probability 1

DEFINITION 31 (Cross-Section Structural Causality) Let j be given If thefunction r(middot zu bj ) Dj rarr R

ky is constant on Dj for all admissible z and u

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 271

then Dj does not structurally cause Yj and we write Dj rArrS Yj Otherwise Dj

structurally causes Yj and we write Dj rArrS Yj

Here we implicitly assume that there is variation in potential cause Dj (ie Dj

is nondegenerated) and under the set of counterfactual policies Dj the structuralfunction r is invariant Therefore the invariant function r fully characterizes thecausalstructural relationship between Yj and (Dj Zj Uj )

Structural causality is structural functional dependence in a specific contextSimilar definitions can be given for Zj and Uj but we leave these implicitas these are not the main causes of interest Causality for components of Dj

is defined in an obvious way If we define Yj (d) = r(d Zj Uj bj

)as the

potential outcomes (see eg Rubin 1974 Rosenbaum and Rubin 1983 andHolland 1986) then the equivalent definition of structural noncausality is thatYj (d) = Yj (dlowast) as for all (ddlowast) isin Dj

Structural causality can be easily understood in the familiar linear structure forscalar Yj

Yj = bj0 + Dprimej bj1 + Z prime

j bj2 +Uj

where bj = (bj0bprimej1bprime

j2)prime If bj1 = 0 then Dj does not structurally cause Yj

Otherwise it does This is so natural and intuitive that one might wonder whycausality is such a thorny topic One main reason for confusion arises from therelation between causality and simultaneity as discussed above the triangularsystems considered here obviate this issue Another main reason for confusionis the failure to distinguish carefully between structural equations of the sort writ-ten above and regression equations which may look similar but need not havestructural content The equation above is entirely structural We will encounterregressions (conditional expectations) only after suitable structural foundationsare in place

Observe that heterogeneity in bj rsquos permits Dj to cause Yj for some j and notfor others

The nondegeneracy of Dj ensures that Dj contains at least two points so Dj

is a variable rather than a constant Variation in potential causes is fundamentalfor defining causality (cf Holland 1986) this is what makes possible analogousdefinitions of causality for Zj and Uj Significantly however because bj rsquos arefixed constants they cannot be causes of Yj Instead bj rsquos can be effects or candetermine effects This follows from the following formal definition

DEFINITION 32 (Intervention and Effect) Let j be given and let d and dlowastbe distinct admissible values for Dj Then d rarr dlowast is an intervention to Dj Let(zu) be admissible values for (Zj Uj ) The difference ylowast

j minus yj = r(dlowast zu bj

)minusr(d zu bj

)is the effect on Yj of the intervention d rarr dlowast to Dj at (zu)

This is also referred to as the ldquocausal effectrdquo in the treatment effect literature(see eg Rubin 1974) For the linear case with scalar d the effect of a one-unit

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

272 XUN LU ET AL

intervention d rarr d +1 is

ylowastj minus yj = (

bj0 + (d +1)bj1 + zprimebj2 +u)minus (

bj0 +d bj1 + zprimebj2 +u) = bj1

For unit j this effect is fixed ie it is the same constant for all (zu) Becausebj can differ across units this permits effect heterogeneity

Alternatively if Yj = bj0 + Zj Dprimej bj1 +Uj with scalar Zj then the effect of

d rarr dlowast is

ylowastj minus yj = (

bj0 + zdlowastprimebj1 +u)minus (

bj0 + zd primebj1 +u) = z

(dlowast minusd

)primebj1

Here the effect of Dj on Yj depends on z but not u In this case bj1 determinesthe effect of Dj on Yj together with Zj When an effect depends on a variable(ie an element of Zj or Uj ) it is standard (especially in the epidemiologicalliterature) to call that variable an effect modifier

For simplicity we call ylowastj minus yj the effect of d rarr dlowast Chalak and White (2012)

discuss indirect direct and total effects in structural equations In the potentialoutcome framework Rubin (2004) discusses ldquodirectrdquo and ldquoindirectrdquo casual effectusing principal stratification

We distinguish the structural causal effect from various average causal effectssuch as E[r(dlowast Zj Uj bj ) minus r(d Zj Uj bj )] It is clear that structural non-causality implies zero average causal effects while the converse is not necessarilytrue

To describe identified effects in samples from heterogeneous populationswe define population groups Jg g = 1 as collections of populationunits having identical bj and identical distributions of (Dj Uj ) | X j whereX j = (Z prime

j W primej )

prime We define G = 1 As the population is finite so is thenumber of groups le N We define Ng = j isinJg where middot is the cardinal-ity of the indicated set and let pg = NgN be the proportion of population unitsbelonging to group Jg The bj rsquos by themselves need not define groups as thedistributions of (Dj Uj ) | X j may differ for units with identical bj We call cross-group variation in the distributions of (Dj Uj ) | X j distributional heterogeneityto distinguish this from structural heterogeneity that refers to cross-group varia-tion in bj

Note that because groups are defined by unobservable constants bj and dis-tributions involving the unobservable Uj in general we do not know for surewhether two units belong to the same group Interestingly in the related litera-ture Bester and Hansen (2016) consider estimating grouped effects in panel datamodels when each individualrsquos group identity is known and Su Shi and Phillips(2014) consider identifying latent group structures in panel data models via a vari-ant of Lasso Nevertheless both groups of researchers have only focused on thestructural heterogeneity in the panel framework

Given A1 it follows that (Yj Dj ) | X j is identically distributed for all unitsj in group Jg As a convenient shorthand for j isin Jg we write (Yj Dj ) | X j sim(Yg Dg) | Xg where A sim B denotes that A is distributed as B

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 273

Typically we do not observe an entire population Instead we observe observa-tions sampled from the population in some way For simplicity and concretenesswe consider simple random sampling At the sample level we write4

Y = r (D Z U B)

where B is the randomly sampled bj Note that the randomness of B arises solelyfrom the sampling process We write X = (Z primeW prime)prime and distinguish X from thecauses D by calling X a vector of covariates

For any random vector X we let f (x) and F (x) denote the joint prob-ability density function (PDF) and cumulative distribution function (CDF)respectively For any two random vectors X and Y we use f (y|x) and F (y|x)to denote the conditional PDF and CDF of Y given X = x respectively Wealso use subscript j and g to denote individual j and group g respectivelyFor example Fg(ud x) and fg (u|d x) denotes the CDF of (Ug Dg Xg)and the conditional PDF of Ug given Dg Xg for group g respectively Notethat Fg(ud x) = Nminus1

g

sumjisinJg Fj (ud x) which defines the mixture CDF of

(Ug Dg Xg)

32 Gminuscausality a first encounterWe now define Gminuscausality for cross-section data based on Grangerrsquos philoso-phy of nonpredictability Holland (1986 p 957) considers a special case whereD is randomized and applies Gminuscausality to cross-section data5 As in Hollandwe define Gminusnon-causality as a conditional independent statement FollowingDawid (1979) we write X perp Y | Z to denote that X and Y are independentgiven Z and X perp Y | Z if X and Y are not independent given Z Translat-ing Granger and Newboldrsquos (1986 p 221) definition to the cross-section contextgives

DEFINITION 33 (GminusCausality) Let Y Q and S be random vectors IfQ perp Y | S then Q does not Gminuscause Y with respect to (wrt) S OtherwiseQ Gminuscauses Y wrt S

Here Y Q and S can be any random vectors and this notion has no structuralcontent When S is a constant we have the simplest form of Gminusnon-causalityindependence Correlation is an example of Gminuscausality

To give Gminuscausality structural meaning we impose A1 assume that D obeysa suitable exogeneity condition and take Q = D and S = X The exogeneityassumed for D ensures identification for various measures of its effects and canoften be structurally justified It also turns out to be ideally suited to endowingGminuscausality with structural meaning

To see how this works in a simple setting consider the homogeneous case withD being a binary scalar in which average treatment effects (AT E) and averagetreatment effects on the treated (AT T ) are often discussed (see eg Rubin 1974

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

274 XUN LU ET AL

Hahn 1998 Hirano Imbens and Ridder 2003 and Angrist and Pischke 2009)Define the covariate-conditioned effect of treatment on the treated as

AT TX = E (Y (1)minusY (0) | D = 1 X)

where Y (1) is the potential response to treatment and Y (0) is the potentialresponse in the absence of treatment Using A1 we write Y (1) = r(1 Z U b0)and Y (0) = r(0 Z U b0) AT TX thus has a clear structural interpretation Fromthis we can construct AT T

AT T = E (Y (1)minusY (0) | D = 1) = E(AT TX | D = 1)

Note that here we define AT T and AT TX based on our structural equation r Nevertheless they can be defined without a structural model ie based on thetwo potential outcomes Y (0) and Y (1) (see eg Rubin 1974 and Angrist andPischke 2009)

To identify AT TX it suffices that D perp Y (0) | X Given A1 it suffices forthis that D perp U | X as is readily verified This conditional exogeneity is acommon identifying assumption in cross-sections6 Classical strict exogeneity((D Z) perp U ) is sufficient but not necessary for this Using D perp U | X in thesecond line below we have

AT TX = E [r(1 Z U b0) | D = 1 X ]minus E [r(0 Z U b0) | D = 1 X ]

= E [r(1 Z U b0) | D = 1 X ]minus E [r(0 Z U b0) | D = 0 X ]

= E(Y | D = 1 X)minus E(Y | D = 0 X) = μ(1 X)minusμ(0 X)

Thus AT TX can be expressed in terms of the distribution of observables (hereμ(1 X) and μ(0 X)) so this effect is identified (eg Hurwicz 1950) Simi-larly AT T is identified The identification result based on conditional exogeneityhas been extensively discussed in the treatment effect literature (see eg Rubin1974 Holland 1986 and Angrist and Pischke 2009)

To see the structural meaning for Gminuscausality observe that E(Y | D = 1 X)minusE(Y | D = 0 X) = 0 as is another way to write E(Y | D X) = E(Y | X) asThus in this context Gminusnon-causality in mean of D for Y wrt X is equivalentto AT TX = 0 as

Note that even for binary treatments Gminuscausality in mean does not tell thefull story as Gminuscausality in distribution can hold even when AT TX or AT Tvanishes For example suppose

Y = U D

with D perp U E(U ) = 0 E(U 2) lt infin and E(D2) lt infin Then D structurallycauses Y its effect on Y is U a random effect These effects may be posi-tive or negative however they average out as AT T = 07 Here and as we willshow generally Gminusnon-causality essentially has the interpretation that Y is not

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 275

structurally caused by D under the key conditional exogeneity assumption Inthis example we have Gminuscausality (D perp Y ) as E(Y 2 | D) = D2 E(U 2) TestingGminusnon-causality will detect the structural causality here in contrast to testingAT T = 0 or equivalently Gminusnon-causality in mean of D for Y

When D is nonbinary A1 together with suitable exogeneity conditionssimilarly suffices to identify certain effects that give structural meaning toGminuscausality This makes it generally possible to test for structural causality bytesting for Gminuscausality

With heterogeneity matters become somewhat more complicated In particularin the binary case μ(1 X)minusμ(0 X) is no longer AT TX Instead with suitableexogeneity (Assumption A2 below) we have that for each x μ(1 x)minusμ(0 x)recovers a blended effect

μ(1 x)minusμ(0 x) =sumg=1

ζg|x AT Tg|x

where the ζg|x rsquos are nonnegative weights adding to one and for all j isin JgAT Tg|x = E[Yj (1) minus Yj (0) | Dj = 1 X j = x] is the group-specific covariate-conditioned average effect of treatment on the treated Here Yj (1) is the poten-tial response to treatment and Yj (0) is the potential response in the absence oftreatment

33 Exogeneity and effect identification with heterogeneity

With heterogeneity an identifying exogeneity condition relevant for analyzingGminuscausality is

Assumption A2 (Heterogeneous Cross-Section Exogeneity) For all g isin G(i) Dg | Xg sim D | X and (i i) Dg perp Ug | Xg

Observe that this imposes structures for all groups namely all g isin G Althoughthis condition is not the weakest possible (see Chalak and White (2012) and thediscussion below) A2 generally suffices to identify effects and link Gminuscausalityand structural causality A2(i) restricts the allowed distributional heterogeneityfor all g isin G the conditional distributions of Dg given Xg are assumed identicalA2(i i) imposes conditional exogeneity for all groups Below we discuss thisfurther

THEOREM 31 Suppose A1 - A2(i) hold If A2(i i) holds then D perp U | X

It is also of interest to know whether the converse holds A strict conversedoes not hold as certain fortuitous cancellations in population-group condi-tional distributions can yield sample conditional exogeneity without population-group conditional exogeneity Nevertheless the converse does hold under a mildregularity condition ruling out exceptional cases We introduce the followingdefinition

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

276 XUN LU ET AL

DEFINITION 34 Ug | Dg Xg g isin G is regular ifsumgisinG

pg fg(u | d x) fg(x) =sumgisinG

pg fg(u | x) fg(x) for all (ud x) (32)

implies Ug perp Dg | Xg for all g isin GWhen G = 1 Ug | Dg Xg g isin G is necessarily regular To understand whatregularity rules out consider the next simplest case with G = 2 and Xg absentThen Ug | Dg g isin G fails to be regular if and only if U2 perp D2 (say) and

f1(u | d) = f1(u)minus 1minus p1

p1[ f2(u | d)minus f2(u)] for all (ud)

where the two groups have been re-indexed for convenience Suppose that p1f1(u) f2(u) and f2(u | d) are arbitrary Then it is easily arranged that f1(u | d)can be negative for (ud) in a set of positive probability so f1(u | d) does notdefine a conditional density and Ug | Dg g isin G is regular after all If this f1(u |d) is nevertheless a conditional density it is clearly highly special as it ensuresthat eq(32) holds for the given p1 and the functions defined by f1(u) f2(u) andf2(u | d) (all of which are typically unknown) but not necessarily otherwise

In the general case Ug | Dg Xg g isin G fails to be regular if and only ifUg perp Dg | Xg for some g isin G and

f1(u | d x) = f1(u | x)minus (p1 f1(x))minus1sumg=2

pg [ fg(u | d x) minus fg(u | x)] fg(x)

times for all (ud x)

where we again re-index the groups As before this is clearly a very special pop-ulation configuration ruling out such cases by imposing regularity is a weak re-striction

The converse result is

THEOREM 32 Suppose A1 - A2(i) hold and suppose Ug | Dg Xgg isin Gis regular If D perp U | X then A2(i i) holds

Theorems 31-32 also hold with Yg | Dg Xg g isinG regular D perp U | X replacedby D perp Y | X and A2(i i) replaced by Dg perp Yg | Xg g isin G This result plays acentral role in linking structural causality and Gminuscausality

34 Linking structural causality and GminuscausalityAs we have seen A1 imposes structures where causal effects are well definedA2 permits recovering versions of these As we now prove this gives structuralmeaning to Gminuscausality generally Recall that the bj rsquos are identical in group Jgthus the same structural causality relations hold for all j in a given group Wewrite Dg rArrS Yg when Dj rArrS Yj for (all) j in group Jg Our next result showsthat structural noncausality implies Gminusnon-causality

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 277

PROPOSITION 33 Let A1 - A2 hold Then structural noncausality(Dg rArrS Yg forallg isinG) implies that D does not Gminuscause Y wrt X ie D perp Y | X

This result is intuitive when structural effects are well defined and identifiableGminusnon-causality follows from structural noncausality Holland (1986 p 958)gives a similar result for the random experiment case Our Proposition 33 jus-tifies using tests of Gminusnoncausality to test structural noncausality if we rejectGminusnoncausality we must reject structural noncausality

As WL show structural causality does not imply Gminuscausality Without furtherassumptions the concepts are not equivalent Nevertheless if as is commonlyassumed in the literature (see WL for discussion) r obeys separability betweenobservable causes D and unobservable causes U or r obeys a specific form ofmonotonicity in U then with suitable regularity structural causality does implyGminuscausality We have

THEOREM 34 Given A1 - A2 suppose Yg | Dg Xgg isin G is regularSuppose further that for all g isin G either (i) or (i i) holds

(i) For all j isin Jg for unknown measurable functions r1 and r2

Yj = r1(Dj Zj bj

)+ r2(Zj Uj bj

) (33)

(ii) For all j isinJg for = 1 ky r(d zu bj ) = r0(d zu bj ) for scalaru where r0(d z middot bj ) is strictly monotone increasing for each admissi-ble (d z) and Fj(y | d x) = P[Yj le y | Dj = d X j = x] is strictlymonotone in y for each admissible (d x)

Then structural causality (for some g isin G Dj rArrS Yj for (all) j in Jg) impliesthat D Gminuscauses Y wrt X ie D perp Y | X

Kasy (2011) gives a discussion of structures satisfying (i i) Even without sep-arability or monotonicity there is an equivalence between Gminuscausality and astronger notion of structural causality that handles certain exceptional cases wherethe causal structure and the conditional distribution of (Uj Dj ) given X j inter-act in just the right way to hide the structural causality WL call this strongernotion structural causality with positive probability and provide discussions Welet Yj = supp(Yj ) and Xj = supp(X j )

DEFINITION 35 (Structural Causality with Positive Probability) SupposeA1 holds and let j be given Suppose that for each y isin Yj there exists a mea-surable function f jy Xj rarr [01] such thatint

1r(Dj Zj u bj

)lt y

d Fj

(u | X j

) = f jy(X j

)as (34)

where Fj (u | X j ) = E[1Uj le u | X j ] Then Dj does not structurally cause Yj

almost surely (as) wrt X j (Dj rArrS(X j ) Yj ) Otherwise Dj structurally causesYj with positive probability (wpp) wrt X j

(Dj rArrS(X j ) Yj

)

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

278 XUN LU ET AL

By the definition of a group the same structural causality wpp relations holdfor all j in a given group We thus write Dg rArrS(Xg) Yg when Dj rArrS(X j ) Yj holdsfor all j in group Jg

Below is an example in WL in which Dj structurally causes Yj while Dj doesnot structurally cause Yj as Consider the structural equation r

Yj = r(Dj Uj

) = DjradicD2

j +1Uj1 + 1radic

D2j +1

Uj2

where Uj equiv (Uj1Uj2) and Dj are all N (01) random variables and Dj U1 j andU2 j are mutually independent For simplicity there is no bj Zj or X j It is clearthat Dj structurally causes Yj here Nevertheless Dj does not structurally causeYj as To see this note that Yj and Dj are independent Thus the LHS of eq(34)becomesint

1r(Dj u1u2

)lt y

d Fj (u1u2)

=int

1r(Dj u1u2

)lt y

d Fj

(u1u2|Dj

) = Pr[Yj lt y|Dj

] = (y)

where Fj (middot) and Fj(middot|Dj

)denote the CDF of Uj and the conditional CDF of Uj

given Dj and is the standard normal CDF Thus the RHS does not depend onDj ie Dj does not structurally cause Yj as

Theorem 35 below gives a result linking Gminuscausality and structural causalityas

THEOREM 35 Suppose A1 - A2 hold Suppose also that Yg | Dg Xgg isinG is regular If Dj structurally causes Yj wpp wrt X j (for some g isin GDj rArrS(X j ) Yj for all j in Jg) then D Gminuscauses Y wrt X ie D perp Y | X

Together Proposition 33 and Theorem 35 establish that cross-sectionGminuscausality and structural causality are essentially equivalent under the key con-ditional exogeneity assumption We say ldquoessentiallyrdquo as Theorem 35 rules outthe two ways that Gminusnon-causality can mask structural causality The first arisesfrom subtle interactions between the causal structure and the conditional distribu-tion of (Uj Dj ) given X j The second arises under heterogeneity from delicatecancellations among the conditional distributions of Yj given Dj and X j acrossgroups These exceptional possibilities only trivially mitigate the power of testsfor Gminuscausality as tests for structural causality Errors of interpretation and in-ference need not result as long as these exceptions are recognized

35 Identification and identification failure

To gain further insight into the relation between Gminuscausality and effect identi-fication we now undertake a deeper analysis of Gminuscausality in mean For this

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 279

let qg(d x) = fg(d x)sumhisinG (ph fh(d x)) Under A1 the law of iterated ex-

pectations gives

E (Y | D = d X = x) =sumgisinG

pg qg(d x)

intr(d zu bg) d Fg(u | d x)

For concreteness consider identifying average marginal effects for a continu-ous treatment d To allow both discrete and continuous Ug let νg(u | d x) definea σminusfinite measure dominating that defined by Fg(u | d x) and let fg(u | d x)define the associated Radon-Nikodym density for Ug given (Dg Xg) Assum-ing differentiability and mild regularity derivations parallel to those of White andChalak (2013) give

part

partdE(Y | D = d X = x)

=sumgisinG

pg qg(d x)

intpart

partdr(d zu bg) fg(u | d x) dνg(u | d x)

+sumgisinG

pg qg(d x)

int[r(d zu bg)

part

partdln fg(u | d x)]

times fg(u | d x) dνg(u | d x)

+sumgisinG

pgpart

partdqg(d x)

intr(d zu bg) fg(u | d x)dνg(u | d x)

After some manipulation this regression derivative can be written as

part

partdE (Y | D = d X = x) = β(d x)+ δ1(d x)+ δ2(d x) (35)

The first of the three terms on the right is a weighted average marginal effect

β(d x) =sumgisinG

pg qg(d x)βg(d x) where βg(d x)

= E[(partpartd)r(Dg ZgUg bg) | Dg = d Xg = x

]is the covariate-conditioned average marginal effect of Dj on Yj at (Dj = d X j =x) for j isin Jg g isin G Thus when the other two terms vanish the regressionderivative identifies the structurally meaningful weighted effect β(d x) andwhen β(d x) is nonzero for (d x) in a set of positive probability we haveGminuscausality in mean of D for Y wrt X When the other two terms do notvanish we generally have identification failure and there is no necessary linkbetween Gminuscausality and structural causality

The second term is a pure endogeneity bias

δ1(d x) =sumgisinG

pg qg(d x) E(εg Sg | Dg = d Xg = x

)

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

280 XUN LU ET AL

where εg = Ygminus E(Yg | Dg = d Xg = x

)is the regression residual for the given

group and Sg = (partpartd) ln fg(Ug | Dg Xg) is the exogeneity score of White andChalak (2013) Observe that E(εg Sg | Dg = d Xg = x) represents a specificform of omitted variable bias where the exogeneity score is the omitted variableA sufficient (but not necessary) condition for this bias to vanish is A2(i i) Dg perpUg | Xg for all g isin G

The third term is a pure heterogeneity bias

δ2(d x) =sumgisinG

pgpart

partdqg(d x) E

(Yg | Dg = d Xg = x

)

due to heterogeneity of fg(d | x) across members of G8 A sufficient (but notnecessary) condition for this to vanish is A2(i) Dg | Xg sim D | X for all g isin G

Thus A2 ensures that the regression derivative identifies the weighted effect

β(d x) = βlowast(d x) =sumgisinG

pg qlowastg(x)βlowast

g(d x)

where qlowastg(x) = fg(x)

sumh (ph fh(x)) and βlowast

g(d x) = int(partpartd)r(d zu bg)

d Fg(u | x) This identification provides the link between Gminuscausality in meanand structural causality

Although A2 is not necessary for effect identification cases where identifica-tion holds in the absence of A2 are quite special analogous to failures of reg-ularity Thus for practical purposes A2 can be viewed as playing the key rolein identifying effects of interest and thereby linking Gminuscausality and structuralcausality

Further as Proposition 61 and Corollary 62 of WL show in the absence ofstructural causality Gminuscausality is essentially equivalent to exogeneity failureHere this is reflected in the fact that when βg(d x) vanishes for all g A1 andmild regularity conditions give

part

partdE (Y | D = d X = x) = δ1(d x)+ δ2(d x)

Thus with structural noncausality as Gminuscausality in mean implies the fail-ure of A2(i) A2(i i) or both conversely failure of A2 essentially ensuresGminuscausality in mean

36 Testing for Gminuscausality and structural causality in cross-sectiondata

As just seen the hypothesis of Gminusnon-causality and thus of structural noncausal-ity under the conditional exogeneity assumption is a specific conditional inde-pendence In the literature there are many conditional independence tests thatapply to IID data (eg see Delgado and Gonzalez-Manteiga 2001 Fernandes and

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 281

Flores 2001 Su and White 2007 2008 2014 Song 2009 Linton and Gozalo2014 Huang Sun and White 2016) These methods can be computationallychallenging as they are nonparametric Angrist and Kuersteiner (2004 2011)develop a semi-parametric test for conditional independence Based on the factthat Y perp D | X implies ψ1(Y ) perp ψ2(D) | X for any measurable vector functions(ψ1ψ2) WL propose several convenient regression-based tests

4 STRUCTURAL CAUSALITY AND GminusCAUSALITY IN PANEL DATAThe literature contains considerable discussion about Gminuscausality in the paneldata setting Nevertheless the focus is mainly on linear or parametric modelsFor example Chamberlain (1984) discusses Gminuscausality ldquoconditional on un-observables [lsquofixed effectrsquo]rdquo in a linear model and a logit model Holtz-EakinNewey and Rosen (1988) consider a traditional linear vector autoregression(VAR) and use GMM to test Gminuscausality Nair-Reichert and Weinhold (2001)discuss Gminuscausality in a dynamic mixed fixed- and random-coefficients linearmodel allowing heterogeneity across individuals Dumitrescu and Hurlin (2012)test for Gminuscausality in a linear dynamic panel model with fixed coefficients thatvary across individuals There is also a growing literature on nonlinear and non-separable panel models (see eg Hoderlein and White (2012) and the referencestherein) We are not aware that Gminuscausality has been discussed in such modelsso far

We emphasize that in panel data Gminuscausality is also entirely based on pre-dictability and has no causal interpretation In this section we link Gminuscausalityand structural causality in general panel structures Since the static panel case issimilar to the cross-sectional case we focus on a dynamic data generating processTo simplify notation for given y isin N+ we let Y jtminus1 denote the finite historyY jtminus1 = (Yjtminusy Yjtminus1)

Assumption B1 (Panel Population DGP) Let (F P) be a complete prob-ability space let N isin N+ and let y isin N+ For the members of a populationj isin 1 N let the random vectors Yjt t = 12 be structurally determinedby a triangular system as

Yjt = r(Y jtminus1 Djt Zjt Ujt bjt

) (41)

where r is an unknown measurable ky times 1 function ky isin N+ Yjτ (τ = 1 minusy 0) Djt (kd times1kd isin N+) Zjt (kz times1 kz isin N) and Ujt (ku times1 ku isin N)are vectors of nondegenerate random variables on (F P) and bjt is a non-random real vector of dimension kb isin N Suppose also that Wjt t = 12 arerandom vectors on (F P) with dimension kw isin N The triangular structureis such that for t = 12 neither Yjsinfins=1minusy

nor Djsinfins=1 structurally deter-mine Wjt Yjsinfins=1minusy

does not structurally determine Djt Zjt or Ujt andDjsinfins=1 does not structurally determine Zjt or Ujt

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

282 XUN LU ET AL

The interpretation of structural eq(41) is the same as that for eq(31) ex-cept that here we have multiple time periods t = 12 The elements of(Yjt Djt Zjt Wjt Ujt

)and bjt rsquos can contain both time-varying and time-

invariant elements For now we do not distinguish these The elements of(Djt Zjt Wjt Ujt ) can contain lags of underlying variables By permittingonly finite histories Y jtminus1 Djt Zjt we restrict attention to Markov-type datagenerating processes this greatly simplifies the analysis and corresponds to thestructures usually considered in practice

As in the cross-section case we can have both structural heterogeneity anddistributional heterogeneity We specify the relevant structural and distributionalheterogeneities below

We again impose random sampling from the population For simplicity weassume all time periods can be observed for every individual so we observe abalanced panel Similar to the cross-section case we write

Yt = r (Y tminus1 Dt Zt Ut Bt ) t = 1 T

by suppressing the cross-sectional subscript i We assume that we observe data(Yt Dt Zt Wt ) t = 1 T and relevant lags prior to t = 1

41 Linking structural causality and Gminuscausality in dynamic panelsWe first define structural causality for the dynamic panel structure Let Djt be thesupport of Djt

DEFINITION 41 (Structural Causality in Dynamic Panels) Let j and t begiven If the function r(y jtminus1 middot zjt ujt bjt ) Djt rarr R

ky is constant on Djt

for all admissible y jtminus1 zjt and ujt then Djt does not structurally cause Yjt and we write Djt rArrS Yjt Otherwise Djt structurally causes Yjt and we writeDjt rArrS Yjt

We let X jt = (Z primejt W prime

jt )prime denote the covariates at time t WL consider

retrospective conditional exogeneity and retrospective Gminuscausality (White andKennedy 2009) We also allow this For this we letX jt denote a covariate his-tory that may contain X jt and lags or leads of X jt as in Wooldridge (2005 p 41)or WL ie for some m isin N+

X jt = SX(X jtminusm X jt X jt+m

) t = 12

where SX is a given selection matrix We assume

X jt X jtminusm subeX jt and

allow X jt also to contain leads of X jt When Xjt contains leads we have theretrospective case as in WL We denote randomly sampled covariates X t Forsimplicity we also assume that all needed elements of X t are observable Tem-poral precedence plays an important role in panel data In particular comparedwith the cross-sectional case we now have a richer set of covariates (includingboth leads and lags) to choose from

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 283

To provide a link between structural causality and Gminuscausality for dynamicpanels we condition on not only covariates X t but also lags of Yt For thiswe denote Y jtminus1 = (Yjtminusmy Yjtminus1) where my isin N+ and my ge y ie

Y jtminus1 sube Y jtminus1 Similarly we let Y tminus1 be the randomly sampled Y jtminus1Similar to the cross-section case for each t we define population group Jgt

g isin 1 t as a collection of population units having identical bjt and iden-tical distribution of (Ujt Djt ) |X jt Y jtminus1 We also let Gt = 1 t Forgiven t we write Dgt rArrS Ygt to denote Djt rArrS Yjt for all j in group Jgt g isin Gt

Next we impose an exogeneity assumption analogous to A2

Assumption B2 (Heterogeneous Dynamic Panel Exogeneity) Let t be givenFor all g isin Gt (i) Dgt |Xgt Y gtminus1 sim Dt |X t Y tminus1 and (i i) Dgt perp Ugt |Xgt Y gtminus1

Here as before for all g isin Gt the conditional distributions of Dgt given(Xgt Y gtminus1) are assumed to be identical and we write the common distributionas Dt |X t Y tminus1

The analog of Theorem 31 relating population and sample conditional exo-geneity is

THEOREM 41 Suppose B1ndashB2(i) hold and let t be given If B2(i i) holdsthen Dt perp Ut |X t Y tminus1

Proposition 42 provides a link between structural noncausality and Gminusnon-causality

PROPOSITION 42 Suppose that B1 and B2 hold Let t be given Thenstructural noncausality (Dgt rArrS Ygt forallg isin Gt ) implies that Dt does notGminuscause Yt wrtX t Y tminus1 ie Dt perp Yt |X t Y tminus1

We can also define structural causality with positive probability for dynamicstructures and show that structural causality and Gminuscausality are ldquoessentiallyrdquoequivalent under the key conditional exogeneity assumption For brevity we omitthe details

So far we have not carefully distinguished between time-varying and time-invariant elements of unobservable bjt and Ujt Below we briefly discuss thecase where we allow time-invariant components of Ujt and the randomly sampledbjt to be arbitrarily dependent or correlated with the cause of interest Djt For this we denote Ujt equiv (Ujt Uj0) where Ujt is time varying and Uj0time-invariant Similarly we denote bjt equiv (bjt bj0) The randomly sampledUjt Uj0 bjt bj0 are denoted as Ut U0 Bt B0 respectively In princi-ple we want to remove the time-invariant Uj0 and bj0 using eg the firstdifference For this it is convenient to impose a separable assumption on thestructural equation9 We also impose a conditional exogeneity assumption at thesample level

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

284 XUN LU ET AL

Assumption B3 Suppose that B1 holds with

Yjt = r1(Djt Uj0 bj0

)+r2(Y jtminus1 Djt Zjt Ujt bjt

) j = 1 N t = 1

where r1 and r2 are two unknown ky times1 measurable functions

Assumption B4 Suppose that for given t Dt perp (Ut Utminus1 Bt Btminus1

) |Xt Y tminus1

B4 is a sample-level assumption which can be supported by a correspond-ing assumption at the heterogeneous population level Note that in general B4is a strong assumption as we condition on Y tminus1 which is a function of bothDtminus1 and Utminus1 Nevertheless this assumption is plausible under the null of struc-tural noncausality as Y jtminus1 is a constant function of Djtminus1 under the nullOne simple example is that bjt rsquos are constants over j and t and Djt infint=1 perpYjt 0

t=1minusyUjt infint=1X jt infint=1 Under the null of structural noncausality for

all t Y jtminus1 is a function of Yjs0s=1minusy

Ujsts=1X jst

s=1 Then it is easyto show that in this case B4 is satisfied Certainly it will be interesting to relaxB4 but we leave this for future research

PROPOSITION 43 Suppose that B1ndashB4 hold Suppose that Dgt rArrS Ygtand Dgtminus1 rArrS Ygtminus1 forallg isin Gt cupGtminus1 Then Dt perp Yt | Xt Y tminus1

Proposition 43 suggests that when there are time-invariant components in theunobservables we can test for structural noncausality by testing for Gminusnon-causality Dt perp Yt | Xt Y tminus1 under the conditional exogeneity assumption B4

42 Testing for Gminuscausality and structural causality in panel dataAs shown above we can perform a Gminusnon causality test to test for structuralnoncausality Testing Gminusnon-causality in panel data is also simply a conditionalindependence test Here we focus on testing Dt perp Yt | Xt Y tminus1

First assume that the joint distributions of (Dt Yt Xt ) are identical over t In this case we can pool the data and implement a conditional independence testas discussed in Section 36

Second suppose that the joint distributions of (Dt Yt Xt ) are different overtime t In this case for each time period t using the cross-section data we canimplement a conditional independence test for Dt perp Yt | (Xt Y tminus1) and obtain atest statistic sayWt Thus we have T test statistics W1W2 WT and eachone can be used to test for structural noncausality for each t We may also want totest the hypothesis that for all t Djt does not structurally cause Yjt For this wecan construct our test statistic by taking the average of the T test statistics ieW equiv 1

T

sumTt=1Wt as in Im Pesaran and Shin (2003) and Dumitrescu and Hurlin

(2012) Other forms of ldquoaveragesrdquo are possible and special care is needed to takeinto account the dependence structure over time

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 285

5 CONCLUSION

This paper provides direct links between Granger causality and structural causal-ity in cross-section and panel data We extend Granger causality to cross-section and panel data and give a natural definition of structural causality inheterogeneous populations We show that under the key conditional exogeneityassumption Granger causality is essentially equivalent to structural causality incross-section and panel data Similar to the results in White and Lu (2010) fortime-series data our results here should enable researchers to avoid the misuse ofGranger causality and to establish the desired structural causal relation

NOTES

1 Strictly speaking if the conditional distributions are parameterized or treated as infinite-dimensional parameters then distributional heterogeneity may also be thought of as a special caseof structural heterogeneity

2 The classification is certainly overly simplistic We ignore the early literature on the philosophyof causality (for a review see eg Holland 1986 and Hoover 2008) We also ignore the litera-ture on causality discussed in other disciplines notably in machine learning (see eg Pearl 20092015 White and Chalak 2009 and White Chalak and Lu 2011) In particular Heckman and Pinto(2015) compare Haavelmorsquos structural framework of causality with the framework of Directed AcyclicGraphs (DAG) studied in the Bayesian network

3 Although Uj is typically called ldquounobservablerdquo it is better to view its elements as variables thatwill be omitted from empirical analysis This may be because they are unobservable it may also bebecause the researcher has purposefully or inadvertantly neglected them

4 For notational simplicity we suppress the observation subscript i5 Using our notation Holland considers the case where D perp (XU ) which implies our condi-

tional exogeneity assumption D perp U | X below6 See eg Altonji and Matzkin (2005) Hoderlein and Mammen (2007 2009) Imbens and Newey

(2009) White and Lu (2011) and White and Chalak (2013)7 AT T = E(Y (1)minusY (0) | D = 1) = E(U minus0 | D = 1) = E(U ) = 0

8 The conditional density fg(d | x) can also be interpreted as Imbensrsquos (2000) generalized propen-sity score

9 There is a recent literature which allows time-invariant unobservables to be arbitrarily correlatedwith causes of interest in the general nonseparable models (see eg Evdokimov 2010) We leave theproblem of linking G-causality and structural causality in this general setting for future research

REFERENCES

Altonji J amp R Matzkin (2005) Cross section and panel data estimators for nonseparable models withendogenous regressors Econometrica 73 1053ndash1102

Angrist J K Graddy amp G Imbens (2000) The interpretation of instrumental variables estimatorsin simultaneous equations models with an application to the demand for fish Review of EconomicStudies 67 499ndash527

Angrist J O Jorda amp G Kuersteiner (2013) Semiparametric estimates of monetary policy effectsstring theory revisited NBER Working Paper 19355

Angrist J amp G Kuersteiner (2004) Semiparametric causality tests using the policy propensity scoreNBER Working Paper No 10975

Angrist J amp G Kuersteiner (2011) Causal effects of monetary shocks semiparametric conditionalindependence tests with a multinomial propensity score Review of Economics and Statistics 93725ndash747

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

286 XUN LU ET AL

Angrist J amp J Pischke (2009) Mostly Harmless Econometrics An Empiricistrsquos Companion PrincetonUniversity Press

Bester CA amp CB Hansen (2016) Grouped effects estimators in fixed effects models Journal ofEconometrics 190 197ndash208

Bonhomme S amp E Manresa (2015) Grouped patterns of heterogeneity in panel data Econometrica83 1147ndash1184

Browning M amp JM Carro (2007) Heterogeneity and microeconometrics modelling In R BlundellW K Newey and T Persson (Eds) Advances in Economics and Econometrics Theory andApplications Ninth World Congress of the Econometric Society Volume 3 pp 45ndash74 CambridgeUniversity Press

Browning M amp JM Carro (2010) Heterogeneity in dynamic discrete choice models EconometricsJournal 13 1ndash39

Burda M M Harding amp J Hausman (2015) A Bayesian semi-parametric competing risk model withunobserved heterogeneity Journal of Applied Econometrics 30 353ndash376

Chalak K amp H White (2011) An extended class of instrumental variables for the estimation of causaleffects Canadian Journal of Economics 44 1ndash51

Chalak K amp H White (2012) Causality conditional independence and graphical separation insettable systems Neural Computation 24 1611ndash1668

Chamberlain G (1982) The general equivalence of Granger and Sims causality Econometrica 50569ndash581

Chamberlain G (1984) Panel data In Z Griliches amp M Intriligator (eds) Handbook of Economet-rics vol2 ch 22 pp1247ndash1318 Elsevier

Chesher A (2003) Identification in nonseparable models Econometrica 71 1405ndash1441Chesher A (2005) Nonparametric identification under discrete variation Econometrica 73

1525ndash1550Dawid AP (1979) Conditional independence in statistical theory Journal of the Royal Statistical

Society Series B 41 1ndash31Deb P amp PK Trivedi (2013) Finite mixture for panels with fixed effects Journal of Econometric

Methods 2 35ndash51Delgado M amp W Gonzalez-Manteiga (2001) Significance testing in nonparametric regression based

on the bootstrap Annals of Statistics 29 1469ndash1507Dumitrescu E-I amp C Hurlin (2012) Testing for Granger non-causality in heterogeneous panels

Economic Modelling 29 1450ndash1460Evdokimov K (2010) Identification and estimation of a nonparametric panel data model with unob-

served heterogeneity Working paper Department of Economics Princeton UniversityFlorens JP (2003) Some technical issues in defining causality Journal of Econometrics 112

127ndash128Florens JP amp D Fougere (1996) Non-causality in continuous time Econometrica 64 1195ndash1212Florens JP amp M Mouchart (1982) A note on non-causality Econometrica 50 583ndash591Granger CWJ (1969) Investigating causal relations by econometric models and cross-spectral

methods Econometrica 37 424ndash438Granger CWJ (1980) Testing for causality a personal viewpoint Journal of Economic Dynamics

and Control 2 329ndash352Granger CWJ (1988) Some recent developments in a concept of causality Journal of Econometrics

39 199ndash211Granger CWJ amp P Newbold (1986) Forecasting Economic Time Series Academic Press 2nd

editionHausman J amp T Woutersen (2014) Estimating a semi-parametric duration model without specifying

heterogeneity Journal of Econometrics 178 114ndash131Haavelmo T (1943) The statistical implications of a system of simultaneous equations Econometrica

11 1ndash12Haavelmo T (1944) The probability approach in econometrics Econometrica 12 (supp) iii-vi

1ndash115

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 287

Hahn J (1998) On the role of the propensity score in efficient semiparametric estimation of averagetreatment effect Econometrica 66 315ndash331

Hahn J amp R Moon (2010) Panel data models with finite number of multiple equilibria EconometricTheory 26 863ndash881

Heckman JJ (2000) Causal parameters and policy analysis in economics a twentieth century retro-spective Quarterly Journal of Economics 115 45ndash97

Heckman JJ (2008) Econometric causality International Statistical Review 76 1ndash27Heckman JJ amp R Pinto (2015) Causal analysis after Haavelmo Econometric Theory 31

115ndash151Hirano K G Imbens amp G Ridder (2003) Efficient estimation of average treatment effects using the

estimated propensity score Econometrica 71 1161ndash1189Hoderlein S amp E Mammen (2007) Identification of marginal effects in nonseparable models without

monotonicity Econometrica 75 1513ndash1519Hoderlein S amp E Mammen (2009) Identification and estimation of local average derivatives in non-

separable models without monotonicity Econometrics Journal 12 1ndash25Hoderlein S amp H White (2012) Nonparametric identification in nonseparable panel data models with

generalized fixed effects Journal of Econometrics 168 300ndash314Holland P (1986) Statistics and causal inference Journal of the American Statistical Association 81

945ndash970Holtz-Eakin D W Newey amp HS Rosen (1988) Estimating vector autoregressions with panel data

Econometrica 56 1371ndash1396Hoover K (2008) Causality in economics and econometrics In S Durlauf and L Blume (eds)

The New Palgrave Dictionary of Economics 2nd edition Palgrave MacmillanHsiao C (2014) Analysis of Panel Data Cambridge University PressHuang M Y Sun amp H White (2016) A flexible nonparametric test for conditional independence

Econometric Theory forthcomingHurwicz L (1950) Generalization of the concept of identification In TC Koopmans (eds) Statistical

Inference in Dynamic Economic Models pp 238ndash257 John WileyIm K H Pesaran amp Y Shin (2003) Testing for unit roots in heterogeneous panels Journal of Econo-

metrics 115 53ndash74Imbens G (2000) The role of the propensity score in estimating dose-response functions Biometrika

87 706ndash710Imbens G (2004) Nonparametric estimation of average treatment effects under exogeneity a review

Review of Economics and Statistics 86 4ndash29Imbens G amp W Newey (2009) Identification and estimation of triangular simultaneous equations

models without additivity Econometrica 77 1481ndash1512Imbens G amp J Wooldridge (2009) Recent developments in the econometrics of program evaluation

Journal of Economic Literature 47 5ndash86Kasy M (2011) Identification in triangular systems using control functions Econometric Theory 27

663ndash671Koopmans T (1950) Statistical Inference in Dynamic Economic Models Cowles Commission Mono-

graph No 10 WileyKuersteiner G (2008) GrangerndashSims causality In S Durlauf and L Blume (Eds) The New Palgrave

Dictionary of Economics 2nd edition Palgrave MacmillanLechner M (2001) Identification and estimation of causal effect of multiple treatments under con-

ditional independence assumption In M Lechner F Pfeiller (eds) Econometric Evaluations ofLabour Market Policies Hildelberg Physical 43ndash58

Lechner M (2011) The relation of different concepts of causality used in time series and microecono-metrics Econometric Reviews 30 109ndash127

Lin C-C amp S Ng (2012) Estimation of panel data models with parameter heterogeneity when groupmembership is unknown Journal of Econometric Methods 1 42ndash55

Lindsay B (1995) Mixture Models Theory Geometry and Applications Hayward Institute for Math-ematical Statistics

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

288 XUN LU ET AL

Linton O amp P Gozalo (2014) Testing conditional independence restrictions Econometric Reviews33 523ndash552

Lu X amp L Su (2014) Determining the number of groups in latent panel structures Working paperSchool of Economics Singapore Management University

Matzkin R (2003) Nonparametric estimation of nonadditive random functions Econometrica 711339ndash1375

Matzkin R (2007) Nonparametric identification In JJ Heckman and EE Leamer (eds) The Hand-book of Econometrics 6B North-Holland

Mclachlan G amp K Basford (1988) Mixture Models Inference and Applications to Clustering MarcelDekker

Nair-Reichert U amp D Weinhold (2001) Causality tests for cross-country panels a new look at FDIand economic growth in developing countries Oxford Bulletin of Economics and Statistics 63153-171

Pearl J (2009) Causality Models Reasoning and Inference (2nd edition) Cambridge UniversityPress

Pearl J (2015) Trygve Haavelmo and the emergence of causal calculus Econometric Theory 31152ndash179

Rosenbaum P amp D Rubin (1983) The central role of the propensity score in observational studies forcausal effects Biometrika 70 41ndash55

Rubin D (1974) Estimating causal effects of treatments in randomized and non-randomized studiesJournal of Educational Psychology 66 688ndash701

Rubin D (2004) Direct and indirect causal effects via potential outcomes Scandinavian Journal ofStatistics 31 161ndash170

Sarafidis V amp N Weber (2015) A partially heterogeneous framework for analyzing panel data OxfordBulletin of Economics and Statistics 77 274ndash296

Sims C (1972) Money income and causality American Economic Review 62 540ndash552Song K (2009) Testing conditional independence via Rosenblatt transforms Annals of Statistics 37

4011ndash4015Strotz R amp H Wold (1960) Recursive vs nonrecursive systems An attempt at synthesis Economet-

rica 28 417ndash427Su L Z Shi amp PCB Phillips (2014) Identifying latent structures in panel data Working paper

Department of Economics Yale UniversitySu L amp H White (2007) A Consistent characteristic function-based test for conditional indepen-

dence Journal of Econometrics 141 807ndash834Su L amp H White (2008) A nonparametric Hellinger metric test for conditional independence Econo-

metric Theory 24 829ndash864Su L amp H White (2014) Testing conditional independence via empirical likelihood Journal of

Econometrics 182 27ndash44Sun Y (2005) Estimation and inference in panel structure models Working paper Department of

Economics UCSDWhite H amp K Chalak (2009) Settable systems An extension of Pearlrsquos causal model with optimiza-

tion equilibrium and learning Journal of Machine Learning Research 10 1759ndash1799White H amp K Chalak (2010) Testing a conditional form of exogeneity Economics Letters 109

88ndash90White H amp K Chalak (2013) Identification and identification failure for treatment effects using

structural systems Econometric Reviews 32 273ndash317White H K Chalak amp X Lu (2011) Linking Granger causality and the Pearl causal model with

settable systems Journal of Machine Learning Research Workshop and Conference Proceedings12 1ndash29

White H amp P Kennedy (2009) Retrospective estimation of causal effects through time In J Castleamp N Shephard (eds) The Methodology and Practice of Econometrics A Festschrift in Honour ofDavid F Hendry pp 59ndash87 Oxford University Press

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 289

White H amp X Lu (2010) Granger causality and dynamic structural systems Journal of FinancialEconometrics 8 193ndash243

White H amp X Lu (2011) Causal diagrams for treatment effect estimation with application to selectionof efficient covariates Review of Economics and Statistics 93 1453ndash1459

Wooldridge J (2005) Simple solutions to the initial conditions problem for dynamic nonlinear paneldata models with unobserved heterogeneity Journal of Applied Econometrics 20 39ndash54

Zellner A (1979) Causality and econometrics In K Brunner and AH Meltzer (eds) Three Aspectsof Policy and Policymaking Carnegie-Rochester Conference Series Vol 10 North-Holland 9ndash54

APPENDIX

Proof of Theorem 31 To show D perp U | X we establish that f (u | d x) = f (u | x)for all (ud x) We have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

= f (u x)

f (x)= f (u | x)

where the first equality in the second line holds since A2(i) (ie Dg | Xg sim D | X)implies fg(d x) = f (d | x) fg(x) say and the second holds by A2(i i) fg(u | d x) =fg(u | x) n

Proof of Theorem 32 Note that

f (u | x) = f (u x)

f (x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

We also have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | d x) fg(x)sum

gisinG pg fg(x)

The first equality in the second line follows from A2(i) Then f (u | d x) = f (u | x) forall (ud x) if and only ifsumgisinG

pg fg(u | d x) fg(x) =sumgisinG

pg fg(u | x) fg(x) for all (ud x)

By assumption (Ug | Dg Xg) g isinG is regular It follows immediately that for all g isinGDg perp Ug | Xg n

Proof of Proposition 33 Take any g isin G and let j belong to group Jg Dg rArrSYg means that Yj = r(Zj Uj bj ) Thus Uj perp Dj | X j implies that Yj perp Dj | X j byDawid (1979 Lemmas 41 and 42) This holds for all g isin G Applying Theorem 31gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

290 XUN LU ET AL

Proof of Theorem 34 We prove this by showing that Y perp D | X implies that Dg rArrSYg for all g isinG Given the assumed regularity Theorem 32 ensures that Y perp D | X impliesthat Yg perp Dg | Xg for all g isin G

Given (i) eq(33) and A2(ii) imply that for all j in group JgE

(Yj |X j = x Dj = d

) = r1(d z bj

)+ E[r2

(zUj bj

) | Dj = d X j = x]

= r1(d z bj

)+ E[r2

(zUj bj

) | X j = x]

Clearly r1(d z bj

)is constant in d if and only if E

(Yj |Dj = d X j = x

)is constant in

d Yj perp Dj | X j implies that E(Yj |Dj = d X j = x

)is constant in d which then implies

that r1(d z bj

)is constant in d Thus Dj rArrS Yj so Dg rArrS Yg

Given (i i) let ky = 1 without loss of generality Then by A2(ii) for all j in group JgFUj (u | x) = E

[1Uj le u

∣∣ X j = x]

= E[

1r(d zUj bj

) le r(d zu bj

)∣∣ X j = x]

= E[

1r(Dj Zj Uj bj

) le r(d zu bj

)∣∣ Dj = d X j = x]

= E[

1Yj le r

(d zu bj

)∣∣ Dj = d X j = x]

= Fj(r(d zu bj

) | d x)

By strict monotonicity of Fj (y | d x) in y

r(d zu bj

) = Fminus1j

(FUj (u | x) | d x

)

Now Yj perp Dj | X j implies that Fminus1j

(FUj (u | x) | d x

)is constant in d so r

(d zu bj

)is

also a constant function in d ie Dj rArrS Yj so Dg rArrS Yg As either (i) or (i i) holds for each g isin G we have Dg rArrS Yg for all g isin G n

Proof of Theorem 35 To show that A1 - A2 and the assumed regularity for Ygtogether with Dg rArrS(Xg) Yg for some g isin G imply D perp Y | X we let j in group Jgg isin G be such that Dj rArrS(X j ) Yj A2(i i) ensures that Dj perp Uj | X j so

P[Yj le y | Dj = d X j = x] =int

1r(d zu bj ) le y

d Fj (u | d x)

=int

1r(d zu bj ) le y

d Fj (u | x)

By assumption Dj rArrS(X j ) Yj so there exists ylowast isin supp(Yj ) such that there is no mea-

surable mapping f jylowast for whichint

1r(d zu bj ) le ylowastd Fj (u | d x) = f jylowast(x) ae-x Specifically this rules out the possibility thatint

1r(d zu bj ) le ylowast

d Fj (u | X j ) = f jylowast(X j ) equiv P[Yj le ylowast | X j ] as

Thus there exists ylowast isin supp(Yj ) such that

P[Yj le ylowast | Dj = d X j = x

] = P[Yj le ylowast | X j = x

]

for x in a set of positive probability so Yj perp Dj | X j Given the regularity assumed for YgTheorem 32 (for Yg) gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 291

Proof of Theorem 41 The proof is analogous to that of Theorem 31 n

Proof of Proposition 42 The proof is analogous to that of Proposition 33 n

Proof of Proposition 43 Under B3 Dgt rArrS Ygt and Dgtminus1 rArrS Ygtminus1 forallg isinGt cupGtminus1 imply that for j isin Jg there exist two measurable functions r1 and r2 such that

Yjt = r1(Uj0 bj0)+ r2(Y jtminus1 Zjt Ujt bjt

) j = 1 N t = 1

Hence

Yjt minusYjtminus1 = r2(Y jtminus1 Zjt Ujt bjt

)minus r2(Y jtminus2 Zjtminus1Ujtminus1 bjtminus1

)

At the sample level this means that

Yt minusYtminus1 = r2(Y tminus1 Zt Ut Bt

)minus r2(Y tminus2 Ztminus1Utminus1 Btminus1

)

Then B4 implies that

Dt perp (Xt Y tminus1Ut Utminus1 Bt Btminus1) | Xt Y tminus1

which further implies that Dt perp (Yt minusYtminus1) | Xt Y tminus1by Dawid (1979 Lemmas 41 and42) as (Yt minus Ytminus1) is a function of (Xt Y tminus1Ut Utminus1 Bt Btminus1) Then we have Dt perpYt | Xt Y tminus1 as Ytminus1 is a component of Y tminus1 n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

Page 7: 33 GRANGER CAUSALITY AND STRUCTURAL CAUSALITY IN … Lu_Su... · 2017. 2. 7. · 264 XUNLUETAL. natural experiments. The goal of this paper is to establish the analogous equiv-alence

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 269

White and Chalak (2013 p 280) show that the structural equation framework isequivalent to the potential outcome framework For example the potential out-come Yi (d) is simply r(dUi ) The unconfoundedness assumption Yi (d) perp Di |Xi in the potential outcome framework is equivalent to the conditional exogeneityassumption Ui perp Di | Xi in the structural equation framework

23 Relationship between these two types of causality

We emphasize that Gminuscausality and Sims causality are entirely based on pre-dictability while the causality defined on structural equations or potential out-comes is a real causal relation In the literature there are several papers thatlink these two types of causality White and Lu (2011) provide a direct linkbetween Gminuscausality and structural causality in structural equations They showthat given the conditional exogeneity assumption structural noncausality isessentially equivalent to Gminusnoncausality Angrist and Kuersteiner (2004 2011)and Kuersteiner (2008) discuss the link between Sims noncausality and thenoncausality defined in the potential outcome framework under the assumption ofselection on observables Angrist Jorda and Kuersteiner (2013) study the mon-etary policy effects in the potential outcome framework Lechner (2011) linksGrangerSims noncausality to various average effects defined in the potentialoutcome framework However all the discussions so far have been made in thecontext of time series data

3 STRUCTURAL CAUSALITY AND GminusCAUSALITY INCROSS-SECTION DATA

31 Structural causality

We first specify a population DGP for a single time period omitting the timeindex We write N+ =12 and N = 0cupN+ We also write N+ = N+ cupinfin and N = 0cup N+

Assumption A1 (Cross-Section Heterogeneous Population DGP) Let(F P) be a complete probability space let N isin N+ and for the membersof a population j isin 1 N let the random vector Yj be structurally determinedby a triangular system as

Yj = r(Dj Zj Uj bj

) (31)

where r is an unknown measurable ky times 1 function ky isin N+ Dj Zj and Uj j = 12 are vectors of nondegenerate random variables on (F P) havingdimensions kd isin N+ kz isin N and ku isin N respectively and bj is a nonrandomreal vector of dimension kb isin N Suppose also that Wj is a random vector on(F P) with dimension kw isin N The triangular structure is such that Yj doesnot structurally determine Dj Zj and Uj and neither Yj nor Dj structurallydetermines Wj

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

270 XUN LU ET AL

To keep causal concepts clear it is important to distinguish between the popu-lation DGP defined above and the sample DGP defined later For now we leavesampling aside We may also refer to members of the population as ldquounitsrdquo orldquoindividualsrdquo and the distribution of the population as a mixture distributionMixture distributions appear in many contexts in the literature and arise natu-rally where a statistical population contains two or more subpopulations (see egLindsay 1995 and Mclachlan and Basford 1988) As will be clear in a momentthe sources of population heterogeneity are in general not observed

We interpret Yj as the response of interest It is determined by variables Dj Zj and Uj and the constants bj The variables can be binary categorical orcontinuous Immediately below we formalize a natural notion of causality for thissystem For now it is heuristically appropriate to view Dj as observable causes ofinterest (eg a treatment) Zj as other observable causes not of primary interestand Uj as unobservable causes3

The structural function r is unknown and our goal will be to learn about itfrom a sample of the population Nevertheless when Wj has positive dimensionr embodies the a priori exclusion restriction that Wj does not determine Yj Thetypical sources of this restriction as well as the identities of Dj Zj and Uj theirstatus as observable or not and the priority or precedence relations embodied inthe assumed triangularity are economic theory and specific domain knowledge

The assumed triangularity rules out explicit simultaneity for succinctness andclarity For the causality in simultaneous structural equations there are differentviews in the literature White and Chalak (2009 2013) follow Strotz and Wold(1960) and argue that simultaneous equations are not causal structural relationsbut are instead ldquomutual consistency conditions holding between distinct sets ofstructural equations ndash for example between one set of structural equations govern-ing partial equilibrium and another governing full equilibriumrdquo However otherresearchers argue that simultaneous structural equations can be given a causalinterpretation For example Angrist Graddy and Imbens (2000) provide a poten-tial outcome interpretation for general nonlinear simultaneous equation modelsThey also show that the standard linear instrumental variable estimator identifiesa weighted average of the causal effects

By definition the constants bj are fixed for a given individual Neverthelessthey may vary across individuals we call this variation structural heterogeneityIf the bj rsquos are identical we write them as b0 We assume that bj rsquos are unknownNote that the presence of bj facilitates writing r without a j subscript as differ-ences in the structural relations across population members can be accommodatedby variations in the possibly infinite-dimensional bj

To give a definition of causality for the structural system in Assumption A1 letDj = supp(Dj ) denote the support of Dj ie the smallest closed set containingDj with probability 1

DEFINITION 31 (Cross-Section Structural Causality) Let j be given If thefunction r(middot zu bj ) Dj rarr R

ky is constant on Dj for all admissible z and u

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 271

then Dj does not structurally cause Yj and we write Dj rArrS Yj Otherwise Dj

structurally causes Yj and we write Dj rArrS Yj

Here we implicitly assume that there is variation in potential cause Dj (ie Dj

is nondegenerated) and under the set of counterfactual policies Dj the structuralfunction r is invariant Therefore the invariant function r fully characterizes thecausalstructural relationship between Yj and (Dj Zj Uj )

Structural causality is structural functional dependence in a specific contextSimilar definitions can be given for Zj and Uj but we leave these implicitas these are not the main causes of interest Causality for components of Dj

is defined in an obvious way If we define Yj (d) = r(d Zj Uj bj

)as the

potential outcomes (see eg Rubin 1974 Rosenbaum and Rubin 1983 andHolland 1986) then the equivalent definition of structural noncausality is thatYj (d) = Yj (dlowast) as for all (ddlowast) isin Dj

Structural causality can be easily understood in the familiar linear structure forscalar Yj

Yj = bj0 + Dprimej bj1 + Z prime

j bj2 +Uj

where bj = (bj0bprimej1bprime

j2)prime If bj1 = 0 then Dj does not structurally cause Yj

Otherwise it does This is so natural and intuitive that one might wonder whycausality is such a thorny topic One main reason for confusion arises from therelation between causality and simultaneity as discussed above the triangularsystems considered here obviate this issue Another main reason for confusionis the failure to distinguish carefully between structural equations of the sort writ-ten above and regression equations which may look similar but need not havestructural content The equation above is entirely structural We will encounterregressions (conditional expectations) only after suitable structural foundationsare in place

Observe that heterogeneity in bj rsquos permits Dj to cause Yj for some j and notfor others

The nondegeneracy of Dj ensures that Dj contains at least two points so Dj

is a variable rather than a constant Variation in potential causes is fundamentalfor defining causality (cf Holland 1986) this is what makes possible analogousdefinitions of causality for Zj and Uj Significantly however because bj rsquos arefixed constants they cannot be causes of Yj Instead bj rsquos can be effects or candetermine effects This follows from the following formal definition

DEFINITION 32 (Intervention and Effect) Let j be given and let d and dlowastbe distinct admissible values for Dj Then d rarr dlowast is an intervention to Dj Let(zu) be admissible values for (Zj Uj ) The difference ylowast

j minus yj = r(dlowast zu bj

)minusr(d zu bj

)is the effect on Yj of the intervention d rarr dlowast to Dj at (zu)

This is also referred to as the ldquocausal effectrdquo in the treatment effect literature(see eg Rubin 1974) For the linear case with scalar d the effect of a one-unit

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

272 XUN LU ET AL

intervention d rarr d +1 is

ylowastj minus yj = (

bj0 + (d +1)bj1 + zprimebj2 +u)minus (

bj0 +d bj1 + zprimebj2 +u) = bj1

For unit j this effect is fixed ie it is the same constant for all (zu) Becausebj can differ across units this permits effect heterogeneity

Alternatively if Yj = bj0 + Zj Dprimej bj1 +Uj with scalar Zj then the effect of

d rarr dlowast is

ylowastj minus yj = (

bj0 + zdlowastprimebj1 +u)minus (

bj0 + zd primebj1 +u) = z

(dlowast minusd

)primebj1

Here the effect of Dj on Yj depends on z but not u In this case bj1 determinesthe effect of Dj on Yj together with Zj When an effect depends on a variable(ie an element of Zj or Uj ) it is standard (especially in the epidemiologicalliterature) to call that variable an effect modifier

For simplicity we call ylowastj minus yj the effect of d rarr dlowast Chalak and White (2012)

discuss indirect direct and total effects in structural equations In the potentialoutcome framework Rubin (2004) discusses ldquodirectrdquo and ldquoindirectrdquo casual effectusing principal stratification

We distinguish the structural causal effect from various average causal effectssuch as E[r(dlowast Zj Uj bj ) minus r(d Zj Uj bj )] It is clear that structural non-causality implies zero average causal effects while the converse is not necessarilytrue

To describe identified effects in samples from heterogeneous populationswe define population groups Jg g = 1 as collections of populationunits having identical bj and identical distributions of (Dj Uj ) | X j whereX j = (Z prime

j W primej )

prime We define G = 1 As the population is finite so is thenumber of groups le N We define Ng = j isinJg where middot is the cardinal-ity of the indicated set and let pg = NgN be the proportion of population unitsbelonging to group Jg The bj rsquos by themselves need not define groups as thedistributions of (Dj Uj ) | X j may differ for units with identical bj We call cross-group variation in the distributions of (Dj Uj ) | X j distributional heterogeneityto distinguish this from structural heterogeneity that refers to cross-group varia-tion in bj

Note that because groups are defined by unobservable constants bj and dis-tributions involving the unobservable Uj in general we do not know for surewhether two units belong to the same group Interestingly in the related litera-ture Bester and Hansen (2016) consider estimating grouped effects in panel datamodels when each individualrsquos group identity is known and Su Shi and Phillips(2014) consider identifying latent group structures in panel data models via a vari-ant of Lasso Nevertheless both groups of researchers have only focused on thestructural heterogeneity in the panel framework

Given A1 it follows that (Yj Dj ) | X j is identically distributed for all unitsj in group Jg As a convenient shorthand for j isin Jg we write (Yj Dj ) | X j sim(Yg Dg) | Xg where A sim B denotes that A is distributed as B

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 273

Typically we do not observe an entire population Instead we observe observa-tions sampled from the population in some way For simplicity and concretenesswe consider simple random sampling At the sample level we write4

Y = r (D Z U B)

where B is the randomly sampled bj Note that the randomness of B arises solelyfrom the sampling process We write X = (Z primeW prime)prime and distinguish X from thecauses D by calling X a vector of covariates

For any random vector X we let f (x) and F (x) denote the joint prob-ability density function (PDF) and cumulative distribution function (CDF)respectively For any two random vectors X and Y we use f (y|x) and F (y|x)to denote the conditional PDF and CDF of Y given X = x respectively Wealso use subscript j and g to denote individual j and group g respectivelyFor example Fg(ud x) and fg (u|d x) denotes the CDF of (Ug Dg Xg)and the conditional PDF of Ug given Dg Xg for group g respectively Notethat Fg(ud x) = Nminus1

g

sumjisinJg Fj (ud x) which defines the mixture CDF of

(Ug Dg Xg)

32 Gminuscausality a first encounterWe now define Gminuscausality for cross-section data based on Grangerrsquos philoso-phy of nonpredictability Holland (1986 p 957) considers a special case whereD is randomized and applies Gminuscausality to cross-section data5 As in Hollandwe define Gminusnon-causality as a conditional independent statement FollowingDawid (1979) we write X perp Y | Z to denote that X and Y are independentgiven Z and X perp Y | Z if X and Y are not independent given Z Translat-ing Granger and Newboldrsquos (1986 p 221) definition to the cross-section contextgives

DEFINITION 33 (GminusCausality) Let Y Q and S be random vectors IfQ perp Y | S then Q does not Gminuscause Y with respect to (wrt) S OtherwiseQ Gminuscauses Y wrt S

Here Y Q and S can be any random vectors and this notion has no structuralcontent When S is a constant we have the simplest form of Gminusnon-causalityindependence Correlation is an example of Gminuscausality

To give Gminuscausality structural meaning we impose A1 assume that D obeysa suitable exogeneity condition and take Q = D and S = X The exogeneityassumed for D ensures identification for various measures of its effects and canoften be structurally justified It also turns out to be ideally suited to endowingGminuscausality with structural meaning

To see how this works in a simple setting consider the homogeneous case withD being a binary scalar in which average treatment effects (AT E) and averagetreatment effects on the treated (AT T ) are often discussed (see eg Rubin 1974

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

274 XUN LU ET AL

Hahn 1998 Hirano Imbens and Ridder 2003 and Angrist and Pischke 2009)Define the covariate-conditioned effect of treatment on the treated as

AT TX = E (Y (1)minusY (0) | D = 1 X)

where Y (1) is the potential response to treatment and Y (0) is the potentialresponse in the absence of treatment Using A1 we write Y (1) = r(1 Z U b0)and Y (0) = r(0 Z U b0) AT TX thus has a clear structural interpretation Fromthis we can construct AT T

AT T = E (Y (1)minusY (0) | D = 1) = E(AT TX | D = 1)

Note that here we define AT T and AT TX based on our structural equation r Nevertheless they can be defined without a structural model ie based on thetwo potential outcomes Y (0) and Y (1) (see eg Rubin 1974 and Angrist andPischke 2009)

To identify AT TX it suffices that D perp Y (0) | X Given A1 it suffices forthis that D perp U | X as is readily verified This conditional exogeneity is acommon identifying assumption in cross-sections6 Classical strict exogeneity((D Z) perp U ) is sufficient but not necessary for this Using D perp U | X in thesecond line below we have

AT TX = E [r(1 Z U b0) | D = 1 X ]minus E [r(0 Z U b0) | D = 1 X ]

= E [r(1 Z U b0) | D = 1 X ]minus E [r(0 Z U b0) | D = 0 X ]

= E(Y | D = 1 X)minus E(Y | D = 0 X) = μ(1 X)minusμ(0 X)

Thus AT TX can be expressed in terms of the distribution of observables (hereμ(1 X) and μ(0 X)) so this effect is identified (eg Hurwicz 1950) Simi-larly AT T is identified The identification result based on conditional exogeneityhas been extensively discussed in the treatment effect literature (see eg Rubin1974 Holland 1986 and Angrist and Pischke 2009)

To see the structural meaning for Gminuscausality observe that E(Y | D = 1 X)minusE(Y | D = 0 X) = 0 as is another way to write E(Y | D X) = E(Y | X) asThus in this context Gminusnon-causality in mean of D for Y wrt X is equivalentto AT TX = 0 as

Note that even for binary treatments Gminuscausality in mean does not tell thefull story as Gminuscausality in distribution can hold even when AT TX or AT Tvanishes For example suppose

Y = U D

with D perp U E(U ) = 0 E(U 2) lt infin and E(D2) lt infin Then D structurallycauses Y its effect on Y is U a random effect These effects may be posi-tive or negative however they average out as AT T = 07 Here and as we willshow generally Gminusnon-causality essentially has the interpretation that Y is not

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 275

structurally caused by D under the key conditional exogeneity assumption Inthis example we have Gminuscausality (D perp Y ) as E(Y 2 | D) = D2 E(U 2) TestingGminusnon-causality will detect the structural causality here in contrast to testingAT T = 0 or equivalently Gminusnon-causality in mean of D for Y

When D is nonbinary A1 together with suitable exogeneity conditionssimilarly suffices to identify certain effects that give structural meaning toGminuscausality This makes it generally possible to test for structural causality bytesting for Gminuscausality

With heterogeneity matters become somewhat more complicated In particularin the binary case μ(1 X)minusμ(0 X) is no longer AT TX Instead with suitableexogeneity (Assumption A2 below) we have that for each x μ(1 x)minusμ(0 x)recovers a blended effect

μ(1 x)minusμ(0 x) =sumg=1

ζg|x AT Tg|x

where the ζg|x rsquos are nonnegative weights adding to one and for all j isin JgAT Tg|x = E[Yj (1) minus Yj (0) | Dj = 1 X j = x] is the group-specific covariate-conditioned average effect of treatment on the treated Here Yj (1) is the poten-tial response to treatment and Yj (0) is the potential response in the absence oftreatment

33 Exogeneity and effect identification with heterogeneity

With heterogeneity an identifying exogeneity condition relevant for analyzingGminuscausality is

Assumption A2 (Heterogeneous Cross-Section Exogeneity) For all g isin G(i) Dg | Xg sim D | X and (i i) Dg perp Ug | Xg

Observe that this imposes structures for all groups namely all g isin G Althoughthis condition is not the weakest possible (see Chalak and White (2012) and thediscussion below) A2 generally suffices to identify effects and link Gminuscausalityand structural causality A2(i) restricts the allowed distributional heterogeneityfor all g isin G the conditional distributions of Dg given Xg are assumed identicalA2(i i) imposes conditional exogeneity for all groups Below we discuss thisfurther

THEOREM 31 Suppose A1 - A2(i) hold If A2(i i) holds then D perp U | X

It is also of interest to know whether the converse holds A strict conversedoes not hold as certain fortuitous cancellations in population-group condi-tional distributions can yield sample conditional exogeneity without population-group conditional exogeneity Nevertheless the converse does hold under a mildregularity condition ruling out exceptional cases We introduce the followingdefinition

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

276 XUN LU ET AL

DEFINITION 34 Ug | Dg Xg g isin G is regular ifsumgisinG

pg fg(u | d x) fg(x) =sumgisinG

pg fg(u | x) fg(x) for all (ud x) (32)

implies Ug perp Dg | Xg for all g isin GWhen G = 1 Ug | Dg Xg g isin G is necessarily regular To understand whatregularity rules out consider the next simplest case with G = 2 and Xg absentThen Ug | Dg g isin G fails to be regular if and only if U2 perp D2 (say) and

f1(u | d) = f1(u)minus 1minus p1

p1[ f2(u | d)minus f2(u)] for all (ud)

where the two groups have been re-indexed for convenience Suppose that p1f1(u) f2(u) and f2(u | d) are arbitrary Then it is easily arranged that f1(u | d)can be negative for (ud) in a set of positive probability so f1(u | d) does notdefine a conditional density and Ug | Dg g isin G is regular after all If this f1(u |d) is nevertheless a conditional density it is clearly highly special as it ensuresthat eq(32) holds for the given p1 and the functions defined by f1(u) f2(u) andf2(u | d) (all of which are typically unknown) but not necessarily otherwise

In the general case Ug | Dg Xg g isin G fails to be regular if and only ifUg perp Dg | Xg for some g isin G and

f1(u | d x) = f1(u | x)minus (p1 f1(x))minus1sumg=2

pg [ fg(u | d x) minus fg(u | x)] fg(x)

times for all (ud x)

where we again re-index the groups As before this is clearly a very special pop-ulation configuration ruling out such cases by imposing regularity is a weak re-striction

The converse result is

THEOREM 32 Suppose A1 - A2(i) hold and suppose Ug | Dg Xgg isin Gis regular If D perp U | X then A2(i i) holds

Theorems 31-32 also hold with Yg | Dg Xg g isinG regular D perp U | X replacedby D perp Y | X and A2(i i) replaced by Dg perp Yg | Xg g isin G This result plays acentral role in linking structural causality and Gminuscausality

34 Linking structural causality and GminuscausalityAs we have seen A1 imposes structures where causal effects are well definedA2 permits recovering versions of these As we now prove this gives structuralmeaning to Gminuscausality generally Recall that the bj rsquos are identical in group Jgthus the same structural causality relations hold for all j in a given group Wewrite Dg rArrS Yg when Dj rArrS Yj for (all) j in group Jg Our next result showsthat structural noncausality implies Gminusnon-causality

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 277

PROPOSITION 33 Let A1 - A2 hold Then structural noncausality(Dg rArrS Yg forallg isinG) implies that D does not Gminuscause Y wrt X ie D perp Y | X

This result is intuitive when structural effects are well defined and identifiableGminusnon-causality follows from structural noncausality Holland (1986 p 958)gives a similar result for the random experiment case Our Proposition 33 jus-tifies using tests of Gminusnoncausality to test structural noncausality if we rejectGminusnoncausality we must reject structural noncausality

As WL show structural causality does not imply Gminuscausality Without furtherassumptions the concepts are not equivalent Nevertheless if as is commonlyassumed in the literature (see WL for discussion) r obeys separability betweenobservable causes D and unobservable causes U or r obeys a specific form ofmonotonicity in U then with suitable regularity structural causality does implyGminuscausality We have

THEOREM 34 Given A1 - A2 suppose Yg | Dg Xgg isin G is regularSuppose further that for all g isin G either (i) or (i i) holds

(i) For all j isin Jg for unknown measurable functions r1 and r2

Yj = r1(Dj Zj bj

)+ r2(Zj Uj bj

) (33)

(ii) For all j isinJg for = 1 ky r(d zu bj ) = r0(d zu bj ) for scalaru where r0(d z middot bj ) is strictly monotone increasing for each admissi-ble (d z) and Fj(y | d x) = P[Yj le y | Dj = d X j = x] is strictlymonotone in y for each admissible (d x)

Then structural causality (for some g isin G Dj rArrS Yj for (all) j in Jg) impliesthat D Gminuscauses Y wrt X ie D perp Y | X

Kasy (2011) gives a discussion of structures satisfying (i i) Even without sep-arability or monotonicity there is an equivalence between Gminuscausality and astronger notion of structural causality that handles certain exceptional cases wherethe causal structure and the conditional distribution of (Uj Dj ) given X j inter-act in just the right way to hide the structural causality WL call this strongernotion structural causality with positive probability and provide discussions Welet Yj = supp(Yj ) and Xj = supp(X j )

DEFINITION 35 (Structural Causality with Positive Probability) SupposeA1 holds and let j be given Suppose that for each y isin Yj there exists a mea-surable function f jy Xj rarr [01] such thatint

1r(Dj Zj u bj

)lt y

d Fj

(u | X j

) = f jy(X j

)as (34)

where Fj (u | X j ) = E[1Uj le u | X j ] Then Dj does not structurally cause Yj

almost surely (as) wrt X j (Dj rArrS(X j ) Yj ) Otherwise Dj structurally causesYj with positive probability (wpp) wrt X j

(Dj rArrS(X j ) Yj

)

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

278 XUN LU ET AL

By the definition of a group the same structural causality wpp relations holdfor all j in a given group We thus write Dg rArrS(Xg) Yg when Dj rArrS(X j ) Yj holdsfor all j in group Jg

Below is an example in WL in which Dj structurally causes Yj while Dj doesnot structurally cause Yj as Consider the structural equation r

Yj = r(Dj Uj

) = DjradicD2

j +1Uj1 + 1radic

D2j +1

Uj2

where Uj equiv (Uj1Uj2) and Dj are all N (01) random variables and Dj U1 j andU2 j are mutually independent For simplicity there is no bj Zj or X j It is clearthat Dj structurally causes Yj here Nevertheless Dj does not structurally causeYj as To see this note that Yj and Dj are independent Thus the LHS of eq(34)becomesint

1r(Dj u1u2

)lt y

d Fj (u1u2)

=int

1r(Dj u1u2

)lt y

d Fj

(u1u2|Dj

) = Pr[Yj lt y|Dj

] = (y)

where Fj (middot) and Fj(middot|Dj

)denote the CDF of Uj and the conditional CDF of Uj

given Dj and is the standard normal CDF Thus the RHS does not depend onDj ie Dj does not structurally cause Yj as

Theorem 35 below gives a result linking Gminuscausality and structural causalityas

THEOREM 35 Suppose A1 - A2 hold Suppose also that Yg | Dg Xgg isinG is regular If Dj structurally causes Yj wpp wrt X j (for some g isin GDj rArrS(X j ) Yj for all j in Jg) then D Gminuscauses Y wrt X ie D perp Y | X

Together Proposition 33 and Theorem 35 establish that cross-sectionGminuscausality and structural causality are essentially equivalent under the key con-ditional exogeneity assumption We say ldquoessentiallyrdquo as Theorem 35 rules outthe two ways that Gminusnon-causality can mask structural causality The first arisesfrom subtle interactions between the causal structure and the conditional distribu-tion of (Uj Dj ) given X j The second arises under heterogeneity from delicatecancellations among the conditional distributions of Yj given Dj and X j acrossgroups These exceptional possibilities only trivially mitigate the power of testsfor Gminuscausality as tests for structural causality Errors of interpretation and in-ference need not result as long as these exceptions are recognized

35 Identification and identification failure

To gain further insight into the relation between Gminuscausality and effect identi-fication we now undertake a deeper analysis of Gminuscausality in mean For this

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 279

let qg(d x) = fg(d x)sumhisinG (ph fh(d x)) Under A1 the law of iterated ex-

pectations gives

E (Y | D = d X = x) =sumgisinG

pg qg(d x)

intr(d zu bg) d Fg(u | d x)

For concreteness consider identifying average marginal effects for a continu-ous treatment d To allow both discrete and continuous Ug let νg(u | d x) definea σminusfinite measure dominating that defined by Fg(u | d x) and let fg(u | d x)define the associated Radon-Nikodym density for Ug given (Dg Xg) Assum-ing differentiability and mild regularity derivations parallel to those of White andChalak (2013) give

part

partdE(Y | D = d X = x)

=sumgisinG

pg qg(d x)

intpart

partdr(d zu bg) fg(u | d x) dνg(u | d x)

+sumgisinG

pg qg(d x)

int[r(d zu bg)

part

partdln fg(u | d x)]

times fg(u | d x) dνg(u | d x)

+sumgisinG

pgpart

partdqg(d x)

intr(d zu bg) fg(u | d x)dνg(u | d x)

After some manipulation this regression derivative can be written as

part

partdE (Y | D = d X = x) = β(d x)+ δ1(d x)+ δ2(d x) (35)

The first of the three terms on the right is a weighted average marginal effect

β(d x) =sumgisinG

pg qg(d x)βg(d x) where βg(d x)

= E[(partpartd)r(Dg ZgUg bg) | Dg = d Xg = x

]is the covariate-conditioned average marginal effect of Dj on Yj at (Dj = d X j =x) for j isin Jg g isin G Thus when the other two terms vanish the regressionderivative identifies the structurally meaningful weighted effect β(d x) andwhen β(d x) is nonzero for (d x) in a set of positive probability we haveGminuscausality in mean of D for Y wrt X When the other two terms do notvanish we generally have identification failure and there is no necessary linkbetween Gminuscausality and structural causality

The second term is a pure endogeneity bias

δ1(d x) =sumgisinG

pg qg(d x) E(εg Sg | Dg = d Xg = x

)

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

280 XUN LU ET AL

where εg = Ygminus E(Yg | Dg = d Xg = x

)is the regression residual for the given

group and Sg = (partpartd) ln fg(Ug | Dg Xg) is the exogeneity score of White andChalak (2013) Observe that E(εg Sg | Dg = d Xg = x) represents a specificform of omitted variable bias where the exogeneity score is the omitted variableA sufficient (but not necessary) condition for this bias to vanish is A2(i i) Dg perpUg | Xg for all g isin G

The third term is a pure heterogeneity bias

δ2(d x) =sumgisinG

pgpart

partdqg(d x) E

(Yg | Dg = d Xg = x

)

due to heterogeneity of fg(d | x) across members of G8 A sufficient (but notnecessary) condition for this to vanish is A2(i) Dg | Xg sim D | X for all g isin G

Thus A2 ensures that the regression derivative identifies the weighted effect

β(d x) = βlowast(d x) =sumgisinG

pg qlowastg(x)βlowast

g(d x)

where qlowastg(x) = fg(x)

sumh (ph fh(x)) and βlowast

g(d x) = int(partpartd)r(d zu bg)

d Fg(u | x) This identification provides the link between Gminuscausality in meanand structural causality

Although A2 is not necessary for effect identification cases where identifica-tion holds in the absence of A2 are quite special analogous to failures of reg-ularity Thus for practical purposes A2 can be viewed as playing the key rolein identifying effects of interest and thereby linking Gminuscausality and structuralcausality

Further as Proposition 61 and Corollary 62 of WL show in the absence ofstructural causality Gminuscausality is essentially equivalent to exogeneity failureHere this is reflected in the fact that when βg(d x) vanishes for all g A1 andmild regularity conditions give

part

partdE (Y | D = d X = x) = δ1(d x)+ δ2(d x)

Thus with structural noncausality as Gminuscausality in mean implies the fail-ure of A2(i) A2(i i) or both conversely failure of A2 essentially ensuresGminuscausality in mean

36 Testing for Gminuscausality and structural causality in cross-sectiondata

As just seen the hypothesis of Gminusnon-causality and thus of structural noncausal-ity under the conditional exogeneity assumption is a specific conditional inde-pendence In the literature there are many conditional independence tests thatapply to IID data (eg see Delgado and Gonzalez-Manteiga 2001 Fernandes and

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 281

Flores 2001 Su and White 2007 2008 2014 Song 2009 Linton and Gozalo2014 Huang Sun and White 2016) These methods can be computationallychallenging as they are nonparametric Angrist and Kuersteiner (2004 2011)develop a semi-parametric test for conditional independence Based on the factthat Y perp D | X implies ψ1(Y ) perp ψ2(D) | X for any measurable vector functions(ψ1ψ2) WL propose several convenient regression-based tests

4 STRUCTURAL CAUSALITY AND GminusCAUSALITY IN PANEL DATAThe literature contains considerable discussion about Gminuscausality in the paneldata setting Nevertheless the focus is mainly on linear or parametric modelsFor example Chamberlain (1984) discusses Gminuscausality ldquoconditional on un-observables [lsquofixed effectrsquo]rdquo in a linear model and a logit model Holtz-EakinNewey and Rosen (1988) consider a traditional linear vector autoregression(VAR) and use GMM to test Gminuscausality Nair-Reichert and Weinhold (2001)discuss Gminuscausality in a dynamic mixed fixed- and random-coefficients linearmodel allowing heterogeneity across individuals Dumitrescu and Hurlin (2012)test for Gminuscausality in a linear dynamic panel model with fixed coefficients thatvary across individuals There is also a growing literature on nonlinear and non-separable panel models (see eg Hoderlein and White (2012) and the referencestherein) We are not aware that Gminuscausality has been discussed in such modelsso far

We emphasize that in panel data Gminuscausality is also entirely based on pre-dictability and has no causal interpretation In this section we link Gminuscausalityand structural causality in general panel structures Since the static panel case issimilar to the cross-sectional case we focus on a dynamic data generating processTo simplify notation for given y isin N+ we let Y jtminus1 denote the finite historyY jtminus1 = (Yjtminusy Yjtminus1)

Assumption B1 (Panel Population DGP) Let (F P) be a complete prob-ability space let N isin N+ and let y isin N+ For the members of a populationj isin 1 N let the random vectors Yjt t = 12 be structurally determinedby a triangular system as

Yjt = r(Y jtminus1 Djt Zjt Ujt bjt

) (41)

where r is an unknown measurable ky times 1 function ky isin N+ Yjτ (τ = 1 minusy 0) Djt (kd times1kd isin N+) Zjt (kz times1 kz isin N) and Ujt (ku times1 ku isin N)are vectors of nondegenerate random variables on (F P) and bjt is a non-random real vector of dimension kb isin N Suppose also that Wjt t = 12 arerandom vectors on (F P) with dimension kw isin N The triangular structureis such that for t = 12 neither Yjsinfins=1minusy

nor Djsinfins=1 structurally deter-mine Wjt Yjsinfins=1minusy

does not structurally determine Djt Zjt or Ujt andDjsinfins=1 does not structurally determine Zjt or Ujt

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

282 XUN LU ET AL

The interpretation of structural eq(41) is the same as that for eq(31) ex-cept that here we have multiple time periods t = 12 The elements of(Yjt Djt Zjt Wjt Ujt

)and bjt rsquos can contain both time-varying and time-

invariant elements For now we do not distinguish these The elements of(Djt Zjt Wjt Ujt ) can contain lags of underlying variables By permittingonly finite histories Y jtminus1 Djt Zjt we restrict attention to Markov-type datagenerating processes this greatly simplifies the analysis and corresponds to thestructures usually considered in practice

As in the cross-section case we can have both structural heterogeneity anddistributional heterogeneity We specify the relevant structural and distributionalheterogeneities below

We again impose random sampling from the population For simplicity weassume all time periods can be observed for every individual so we observe abalanced panel Similar to the cross-section case we write

Yt = r (Y tminus1 Dt Zt Ut Bt ) t = 1 T

by suppressing the cross-sectional subscript i We assume that we observe data(Yt Dt Zt Wt ) t = 1 T and relevant lags prior to t = 1

41 Linking structural causality and Gminuscausality in dynamic panelsWe first define structural causality for the dynamic panel structure Let Djt be thesupport of Djt

DEFINITION 41 (Structural Causality in Dynamic Panels) Let j and t begiven If the function r(y jtminus1 middot zjt ujt bjt ) Djt rarr R

ky is constant on Djt

for all admissible y jtminus1 zjt and ujt then Djt does not structurally cause Yjt and we write Djt rArrS Yjt Otherwise Djt structurally causes Yjt and we writeDjt rArrS Yjt

We let X jt = (Z primejt W prime

jt )prime denote the covariates at time t WL consider

retrospective conditional exogeneity and retrospective Gminuscausality (White andKennedy 2009) We also allow this For this we letX jt denote a covariate his-tory that may contain X jt and lags or leads of X jt as in Wooldridge (2005 p 41)or WL ie for some m isin N+

X jt = SX(X jtminusm X jt X jt+m

) t = 12

where SX is a given selection matrix We assume

X jt X jtminusm subeX jt and

allow X jt also to contain leads of X jt When Xjt contains leads we have theretrospective case as in WL We denote randomly sampled covariates X t Forsimplicity we also assume that all needed elements of X t are observable Tem-poral precedence plays an important role in panel data In particular comparedwith the cross-sectional case we now have a richer set of covariates (includingboth leads and lags) to choose from

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 283

To provide a link between structural causality and Gminuscausality for dynamicpanels we condition on not only covariates X t but also lags of Yt For thiswe denote Y jtminus1 = (Yjtminusmy Yjtminus1) where my isin N+ and my ge y ie

Y jtminus1 sube Y jtminus1 Similarly we let Y tminus1 be the randomly sampled Y jtminus1Similar to the cross-section case for each t we define population group Jgt

g isin 1 t as a collection of population units having identical bjt and iden-tical distribution of (Ujt Djt ) |X jt Y jtminus1 We also let Gt = 1 t Forgiven t we write Dgt rArrS Ygt to denote Djt rArrS Yjt for all j in group Jgt g isin Gt

Next we impose an exogeneity assumption analogous to A2

Assumption B2 (Heterogeneous Dynamic Panel Exogeneity) Let t be givenFor all g isin Gt (i) Dgt |Xgt Y gtminus1 sim Dt |X t Y tminus1 and (i i) Dgt perp Ugt |Xgt Y gtminus1

Here as before for all g isin Gt the conditional distributions of Dgt given(Xgt Y gtminus1) are assumed to be identical and we write the common distributionas Dt |X t Y tminus1

The analog of Theorem 31 relating population and sample conditional exo-geneity is

THEOREM 41 Suppose B1ndashB2(i) hold and let t be given If B2(i i) holdsthen Dt perp Ut |X t Y tminus1

Proposition 42 provides a link between structural noncausality and Gminusnon-causality

PROPOSITION 42 Suppose that B1 and B2 hold Let t be given Thenstructural noncausality (Dgt rArrS Ygt forallg isin Gt ) implies that Dt does notGminuscause Yt wrtX t Y tminus1 ie Dt perp Yt |X t Y tminus1

We can also define structural causality with positive probability for dynamicstructures and show that structural causality and Gminuscausality are ldquoessentiallyrdquoequivalent under the key conditional exogeneity assumption For brevity we omitthe details

So far we have not carefully distinguished between time-varying and time-invariant elements of unobservable bjt and Ujt Below we briefly discuss thecase where we allow time-invariant components of Ujt and the randomly sampledbjt to be arbitrarily dependent or correlated with the cause of interest Djt For this we denote Ujt equiv (Ujt Uj0) where Ujt is time varying and Uj0time-invariant Similarly we denote bjt equiv (bjt bj0) The randomly sampledUjt Uj0 bjt bj0 are denoted as Ut U0 Bt B0 respectively In princi-ple we want to remove the time-invariant Uj0 and bj0 using eg the firstdifference For this it is convenient to impose a separable assumption on thestructural equation9 We also impose a conditional exogeneity assumption at thesample level

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

284 XUN LU ET AL

Assumption B3 Suppose that B1 holds with

Yjt = r1(Djt Uj0 bj0

)+r2(Y jtminus1 Djt Zjt Ujt bjt

) j = 1 N t = 1

where r1 and r2 are two unknown ky times1 measurable functions

Assumption B4 Suppose that for given t Dt perp (Ut Utminus1 Bt Btminus1

) |Xt Y tminus1

B4 is a sample-level assumption which can be supported by a correspond-ing assumption at the heterogeneous population level Note that in general B4is a strong assumption as we condition on Y tminus1 which is a function of bothDtminus1 and Utminus1 Nevertheless this assumption is plausible under the null of struc-tural noncausality as Y jtminus1 is a constant function of Djtminus1 under the nullOne simple example is that bjt rsquos are constants over j and t and Djt infint=1 perpYjt 0

t=1minusyUjt infint=1X jt infint=1 Under the null of structural noncausality for

all t Y jtminus1 is a function of Yjs0s=1minusy

Ujsts=1X jst

s=1 Then it is easyto show that in this case B4 is satisfied Certainly it will be interesting to relaxB4 but we leave this for future research

PROPOSITION 43 Suppose that B1ndashB4 hold Suppose that Dgt rArrS Ygtand Dgtminus1 rArrS Ygtminus1 forallg isin Gt cupGtminus1 Then Dt perp Yt | Xt Y tminus1

Proposition 43 suggests that when there are time-invariant components in theunobservables we can test for structural noncausality by testing for Gminusnon-causality Dt perp Yt | Xt Y tminus1 under the conditional exogeneity assumption B4

42 Testing for Gminuscausality and structural causality in panel dataAs shown above we can perform a Gminusnon causality test to test for structuralnoncausality Testing Gminusnon-causality in panel data is also simply a conditionalindependence test Here we focus on testing Dt perp Yt | Xt Y tminus1

First assume that the joint distributions of (Dt Yt Xt ) are identical over t In this case we can pool the data and implement a conditional independence testas discussed in Section 36

Second suppose that the joint distributions of (Dt Yt Xt ) are different overtime t In this case for each time period t using the cross-section data we canimplement a conditional independence test for Dt perp Yt | (Xt Y tminus1) and obtain atest statistic sayWt Thus we have T test statistics W1W2 WT and eachone can be used to test for structural noncausality for each t We may also want totest the hypothesis that for all t Djt does not structurally cause Yjt For this wecan construct our test statistic by taking the average of the T test statistics ieW equiv 1

T

sumTt=1Wt as in Im Pesaran and Shin (2003) and Dumitrescu and Hurlin

(2012) Other forms of ldquoaveragesrdquo are possible and special care is needed to takeinto account the dependence structure over time

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 285

5 CONCLUSION

This paper provides direct links between Granger causality and structural causal-ity in cross-section and panel data We extend Granger causality to cross-section and panel data and give a natural definition of structural causality inheterogeneous populations We show that under the key conditional exogeneityassumption Granger causality is essentially equivalent to structural causality incross-section and panel data Similar to the results in White and Lu (2010) fortime-series data our results here should enable researchers to avoid the misuse ofGranger causality and to establish the desired structural causal relation

NOTES

1 Strictly speaking if the conditional distributions are parameterized or treated as infinite-dimensional parameters then distributional heterogeneity may also be thought of as a special caseof structural heterogeneity

2 The classification is certainly overly simplistic We ignore the early literature on the philosophyof causality (for a review see eg Holland 1986 and Hoover 2008) We also ignore the litera-ture on causality discussed in other disciplines notably in machine learning (see eg Pearl 20092015 White and Chalak 2009 and White Chalak and Lu 2011) In particular Heckman and Pinto(2015) compare Haavelmorsquos structural framework of causality with the framework of Directed AcyclicGraphs (DAG) studied in the Bayesian network

3 Although Uj is typically called ldquounobservablerdquo it is better to view its elements as variables thatwill be omitted from empirical analysis This may be because they are unobservable it may also bebecause the researcher has purposefully or inadvertantly neglected them

4 For notational simplicity we suppress the observation subscript i5 Using our notation Holland considers the case where D perp (XU ) which implies our condi-

tional exogeneity assumption D perp U | X below6 See eg Altonji and Matzkin (2005) Hoderlein and Mammen (2007 2009) Imbens and Newey

(2009) White and Lu (2011) and White and Chalak (2013)7 AT T = E(Y (1)minusY (0) | D = 1) = E(U minus0 | D = 1) = E(U ) = 0

8 The conditional density fg(d | x) can also be interpreted as Imbensrsquos (2000) generalized propen-sity score

9 There is a recent literature which allows time-invariant unobservables to be arbitrarily correlatedwith causes of interest in the general nonseparable models (see eg Evdokimov 2010) We leave theproblem of linking G-causality and structural causality in this general setting for future research

REFERENCES

Altonji J amp R Matzkin (2005) Cross section and panel data estimators for nonseparable models withendogenous regressors Econometrica 73 1053ndash1102

Angrist J K Graddy amp G Imbens (2000) The interpretation of instrumental variables estimatorsin simultaneous equations models with an application to the demand for fish Review of EconomicStudies 67 499ndash527

Angrist J O Jorda amp G Kuersteiner (2013) Semiparametric estimates of monetary policy effectsstring theory revisited NBER Working Paper 19355

Angrist J amp G Kuersteiner (2004) Semiparametric causality tests using the policy propensity scoreNBER Working Paper No 10975

Angrist J amp G Kuersteiner (2011) Causal effects of monetary shocks semiparametric conditionalindependence tests with a multinomial propensity score Review of Economics and Statistics 93725ndash747

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

286 XUN LU ET AL

Angrist J amp J Pischke (2009) Mostly Harmless Econometrics An Empiricistrsquos Companion PrincetonUniversity Press

Bester CA amp CB Hansen (2016) Grouped effects estimators in fixed effects models Journal ofEconometrics 190 197ndash208

Bonhomme S amp E Manresa (2015) Grouped patterns of heterogeneity in panel data Econometrica83 1147ndash1184

Browning M amp JM Carro (2007) Heterogeneity and microeconometrics modelling In R BlundellW K Newey and T Persson (Eds) Advances in Economics and Econometrics Theory andApplications Ninth World Congress of the Econometric Society Volume 3 pp 45ndash74 CambridgeUniversity Press

Browning M amp JM Carro (2010) Heterogeneity in dynamic discrete choice models EconometricsJournal 13 1ndash39

Burda M M Harding amp J Hausman (2015) A Bayesian semi-parametric competing risk model withunobserved heterogeneity Journal of Applied Econometrics 30 353ndash376

Chalak K amp H White (2011) An extended class of instrumental variables for the estimation of causaleffects Canadian Journal of Economics 44 1ndash51

Chalak K amp H White (2012) Causality conditional independence and graphical separation insettable systems Neural Computation 24 1611ndash1668

Chamberlain G (1982) The general equivalence of Granger and Sims causality Econometrica 50569ndash581

Chamberlain G (1984) Panel data In Z Griliches amp M Intriligator (eds) Handbook of Economet-rics vol2 ch 22 pp1247ndash1318 Elsevier

Chesher A (2003) Identification in nonseparable models Econometrica 71 1405ndash1441Chesher A (2005) Nonparametric identification under discrete variation Econometrica 73

1525ndash1550Dawid AP (1979) Conditional independence in statistical theory Journal of the Royal Statistical

Society Series B 41 1ndash31Deb P amp PK Trivedi (2013) Finite mixture for panels with fixed effects Journal of Econometric

Methods 2 35ndash51Delgado M amp W Gonzalez-Manteiga (2001) Significance testing in nonparametric regression based

on the bootstrap Annals of Statistics 29 1469ndash1507Dumitrescu E-I amp C Hurlin (2012) Testing for Granger non-causality in heterogeneous panels

Economic Modelling 29 1450ndash1460Evdokimov K (2010) Identification and estimation of a nonparametric panel data model with unob-

served heterogeneity Working paper Department of Economics Princeton UniversityFlorens JP (2003) Some technical issues in defining causality Journal of Econometrics 112

127ndash128Florens JP amp D Fougere (1996) Non-causality in continuous time Econometrica 64 1195ndash1212Florens JP amp M Mouchart (1982) A note on non-causality Econometrica 50 583ndash591Granger CWJ (1969) Investigating causal relations by econometric models and cross-spectral

methods Econometrica 37 424ndash438Granger CWJ (1980) Testing for causality a personal viewpoint Journal of Economic Dynamics

and Control 2 329ndash352Granger CWJ (1988) Some recent developments in a concept of causality Journal of Econometrics

39 199ndash211Granger CWJ amp P Newbold (1986) Forecasting Economic Time Series Academic Press 2nd

editionHausman J amp T Woutersen (2014) Estimating a semi-parametric duration model without specifying

heterogeneity Journal of Econometrics 178 114ndash131Haavelmo T (1943) The statistical implications of a system of simultaneous equations Econometrica

11 1ndash12Haavelmo T (1944) The probability approach in econometrics Econometrica 12 (supp) iii-vi

1ndash115

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 287

Hahn J (1998) On the role of the propensity score in efficient semiparametric estimation of averagetreatment effect Econometrica 66 315ndash331

Hahn J amp R Moon (2010) Panel data models with finite number of multiple equilibria EconometricTheory 26 863ndash881

Heckman JJ (2000) Causal parameters and policy analysis in economics a twentieth century retro-spective Quarterly Journal of Economics 115 45ndash97

Heckman JJ (2008) Econometric causality International Statistical Review 76 1ndash27Heckman JJ amp R Pinto (2015) Causal analysis after Haavelmo Econometric Theory 31

115ndash151Hirano K G Imbens amp G Ridder (2003) Efficient estimation of average treatment effects using the

estimated propensity score Econometrica 71 1161ndash1189Hoderlein S amp E Mammen (2007) Identification of marginal effects in nonseparable models without

monotonicity Econometrica 75 1513ndash1519Hoderlein S amp E Mammen (2009) Identification and estimation of local average derivatives in non-

separable models without monotonicity Econometrics Journal 12 1ndash25Hoderlein S amp H White (2012) Nonparametric identification in nonseparable panel data models with

generalized fixed effects Journal of Econometrics 168 300ndash314Holland P (1986) Statistics and causal inference Journal of the American Statistical Association 81

945ndash970Holtz-Eakin D W Newey amp HS Rosen (1988) Estimating vector autoregressions with panel data

Econometrica 56 1371ndash1396Hoover K (2008) Causality in economics and econometrics In S Durlauf and L Blume (eds)

The New Palgrave Dictionary of Economics 2nd edition Palgrave MacmillanHsiao C (2014) Analysis of Panel Data Cambridge University PressHuang M Y Sun amp H White (2016) A flexible nonparametric test for conditional independence

Econometric Theory forthcomingHurwicz L (1950) Generalization of the concept of identification In TC Koopmans (eds) Statistical

Inference in Dynamic Economic Models pp 238ndash257 John WileyIm K H Pesaran amp Y Shin (2003) Testing for unit roots in heterogeneous panels Journal of Econo-

metrics 115 53ndash74Imbens G (2000) The role of the propensity score in estimating dose-response functions Biometrika

87 706ndash710Imbens G (2004) Nonparametric estimation of average treatment effects under exogeneity a review

Review of Economics and Statistics 86 4ndash29Imbens G amp W Newey (2009) Identification and estimation of triangular simultaneous equations

models without additivity Econometrica 77 1481ndash1512Imbens G amp J Wooldridge (2009) Recent developments in the econometrics of program evaluation

Journal of Economic Literature 47 5ndash86Kasy M (2011) Identification in triangular systems using control functions Econometric Theory 27

663ndash671Koopmans T (1950) Statistical Inference in Dynamic Economic Models Cowles Commission Mono-

graph No 10 WileyKuersteiner G (2008) GrangerndashSims causality In S Durlauf and L Blume (Eds) The New Palgrave

Dictionary of Economics 2nd edition Palgrave MacmillanLechner M (2001) Identification and estimation of causal effect of multiple treatments under con-

ditional independence assumption In M Lechner F Pfeiller (eds) Econometric Evaluations ofLabour Market Policies Hildelberg Physical 43ndash58

Lechner M (2011) The relation of different concepts of causality used in time series and microecono-metrics Econometric Reviews 30 109ndash127

Lin C-C amp S Ng (2012) Estimation of panel data models with parameter heterogeneity when groupmembership is unknown Journal of Econometric Methods 1 42ndash55

Lindsay B (1995) Mixture Models Theory Geometry and Applications Hayward Institute for Math-ematical Statistics

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

288 XUN LU ET AL

Linton O amp P Gozalo (2014) Testing conditional independence restrictions Econometric Reviews33 523ndash552

Lu X amp L Su (2014) Determining the number of groups in latent panel structures Working paperSchool of Economics Singapore Management University

Matzkin R (2003) Nonparametric estimation of nonadditive random functions Econometrica 711339ndash1375

Matzkin R (2007) Nonparametric identification In JJ Heckman and EE Leamer (eds) The Hand-book of Econometrics 6B North-Holland

Mclachlan G amp K Basford (1988) Mixture Models Inference and Applications to Clustering MarcelDekker

Nair-Reichert U amp D Weinhold (2001) Causality tests for cross-country panels a new look at FDIand economic growth in developing countries Oxford Bulletin of Economics and Statistics 63153-171

Pearl J (2009) Causality Models Reasoning and Inference (2nd edition) Cambridge UniversityPress

Pearl J (2015) Trygve Haavelmo and the emergence of causal calculus Econometric Theory 31152ndash179

Rosenbaum P amp D Rubin (1983) The central role of the propensity score in observational studies forcausal effects Biometrika 70 41ndash55

Rubin D (1974) Estimating causal effects of treatments in randomized and non-randomized studiesJournal of Educational Psychology 66 688ndash701

Rubin D (2004) Direct and indirect causal effects via potential outcomes Scandinavian Journal ofStatistics 31 161ndash170

Sarafidis V amp N Weber (2015) A partially heterogeneous framework for analyzing panel data OxfordBulletin of Economics and Statistics 77 274ndash296

Sims C (1972) Money income and causality American Economic Review 62 540ndash552Song K (2009) Testing conditional independence via Rosenblatt transforms Annals of Statistics 37

4011ndash4015Strotz R amp H Wold (1960) Recursive vs nonrecursive systems An attempt at synthesis Economet-

rica 28 417ndash427Su L Z Shi amp PCB Phillips (2014) Identifying latent structures in panel data Working paper

Department of Economics Yale UniversitySu L amp H White (2007) A Consistent characteristic function-based test for conditional indepen-

dence Journal of Econometrics 141 807ndash834Su L amp H White (2008) A nonparametric Hellinger metric test for conditional independence Econo-

metric Theory 24 829ndash864Su L amp H White (2014) Testing conditional independence via empirical likelihood Journal of

Econometrics 182 27ndash44Sun Y (2005) Estimation and inference in panel structure models Working paper Department of

Economics UCSDWhite H amp K Chalak (2009) Settable systems An extension of Pearlrsquos causal model with optimiza-

tion equilibrium and learning Journal of Machine Learning Research 10 1759ndash1799White H amp K Chalak (2010) Testing a conditional form of exogeneity Economics Letters 109

88ndash90White H amp K Chalak (2013) Identification and identification failure for treatment effects using

structural systems Econometric Reviews 32 273ndash317White H K Chalak amp X Lu (2011) Linking Granger causality and the Pearl causal model with

settable systems Journal of Machine Learning Research Workshop and Conference Proceedings12 1ndash29

White H amp P Kennedy (2009) Retrospective estimation of causal effects through time In J Castleamp N Shephard (eds) The Methodology and Practice of Econometrics A Festschrift in Honour ofDavid F Hendry pp 59ndash87 Oxford University Press

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 289

White H amp X Lu (2010) Granger causality and dynamic structural systems Journal of FinancialEconometrics 8 193ndash243

White H amp X Lu (2011) Causal diagrams for treatment effect estimation with application to selectionof efficient covariates Review of Economics and Statistics 93 1453ndash1459

Wooldridge J (2005) Simple solutions to the initial conditions problem for dynamic nonlinear paneldata models with unobserved heterogeneity Journal of Applied Econometrics 20 39ndash54

Zellner A (1979) Causality and econometrics In K Brunner and AH Meltzer (eds) Three Aspectsof Policy and Policymaking Carnegie-Rochester Conference Series Vol 10 North-Holland 9ndash54

APPENDIX

Proof of Theorem 31 To show D perp U | X we establish that f (u | d x) = f (u | x)for all (ud x) We have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

= f (u x)

f (x)= f (u | x)

where the first equality in the second line holds since A2(i) (ie Dg | Xg sim D | X)implies fg(d x) = f (d | x) fg(x) say and the second holds by A2(i i) fg(u | d x) =fg(u | x) n

Proof of Theorem 32 Note that

f (u | x) = f (u x)

f (x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

We also have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | d x) fg(x)sum

gisinG pg fg(x)

The first equality in the second line follows from A2(i) Then f (u | d x) = f (u | x) forall (ud x) if and only ifsumgisinG

pg fg(u | d x) fg(x) =sumgisinG

pg fg(u | x) fg(x) for all (ud x)

By assumption (Ug | Dg Xg) g isinG is regular It follows immediately that for all g isinGDg perp Ug | Xg n

Proof of Proposition 33 Take any g isin G and let j belong to group Jg Dg rArrSYg means that Yj = r(Zj Uj bj ) Thus Uj perp Dj | X j implies that Yj perp Dj | X j byDawid (1979 Lemmas 41 and 42) This holds for all g isin G Applying Theorem 31gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

290 XUN LU ET AL

Proof of Theorem 34 We prove this by showing that Y perp D | X implies that Dg rArrSYg for all g isinG Given the assumed regularity Theorem 32 ensures that Y perp D | X impliesthat Yg perp Dg | Xg for all g isin G

Given (i) eq(33) and A2(ii) imply that for all j in group JgE

(Yj |X j = x Dj = d

) = r1(d z bj

)+ E[r2

(zUj bj

) | Dj = d X j = x]

= r1(d z bj

)+ E[r2

(zUj bj

) | X j = x]

Clearly r1(d z bj

)is constant in d if and only if E

(Yj |Dj = d X j = x

)is constant in

d Yj perp Dj | X j implies that E(Yj |Dj = d X j = x

)is constant in d which then implies

that r1(d z bj

)is constant in d Thus Dj rArrS Yj so Dg rArrS Yg

Given (i i) let ky = 1 without loss of generality Then by A2(ii) for all j in group JgFUj (u | x) = E

[1Uj le u

∣∣ X j = x]

= E[

1r(d zUj bj

) le r(d zu bj

)∣∣ X j = x]

= E[

1r(Dj Zj Uj bj

) le r(d zu bj

)∣∣ Dj = d X j = x]

= E[

1Yj le r

(d zu bj

)∣∣ Dj = d X j = x]

= Fj(r(d zu bj

) | d x)

By strict monotonicity of Fj (y | d x) in y

r(d zu bj

) = Fminus1j

(FUj (u | x) | d x

)

Now Yj perp Dj | X j implies that Fminus1j

(FUj (u | x) | d x

)is constant in d so r

(d zu bj

)is

also a constant function in d ie Dj rArrS Yj so Dg rArrS Yg As either (i) or (i i) holds for each g isin G we have Dg rArrS Yg for all g isin G n

Proof of Theorem 35 To show that A1 - A2 and the assumed regularity for Ygtogether with Dg rArrS(Xg) Yg for some g isin G imply D perp Y | X we let j in group Jgg isin G be such that Dj rArrS(X j ) Yj A2(i i) ensures that Dj perp Uj | X j so

P[Yj le y | Dj = d X j = x] =int

1r(d zu bj ) le y

d Fj (u | d x)

=int

1r(d zu bj ) le y

d Fj (u | x)

By assumption Dj rArrS(X j ) Yj so there exists ylowast isin supp(Yj ) such that there is no mea-

surable mapping f jylowast for whichint

1r(d zu bj ) le ylowastd Fj (u | d x) = f jylowast(x) ae-x Specifically this rules out the possibility thatint

1r(d zu bj ) le ylowast

d Fj (u | X j ) = f jylowast(X j ) equiv P[Yj le ylowast | X j ] as

Thus there exists ylowast isin supp(Yj ) such that

P[Yj le ylowast | Dj = d X j = x

] = P[Yj le ylowast | X j = x

]

for x in a set of positive probability so Yj perp Dj | X j Given the regularity assumed for YgTheorem 32 (for Yg) gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 291

Proof of Theorem 41 The proof is analogous to that of Theorem 31 n

Proof of Proposition 42 The proof is analogous to that of Proposition 33 n

Proof of Proposition 43 Under B3 Dgt rArrS Ygt and Dgtminus1 rArrS Ygtminus1 forallg isinGt cupGtminus1 imply that for j isin Jg there exist two measurable functions r1 and r2 such that

Yjt = r1(Uj0 bj0)+ r2(Y jtminus1 Zjt Ujt bjt

) j = 1 N t = 1

Hence

Yjt minusYjtminus1 = r2(Y jtminus1 Zjt Ujt bjt

)minus r2(Y jtminus2 Zjtminus1Ujtminus1 bjtminus1

)

At the sample level this means that

Yt minusYtminus1 = r2(Y tminus1 Zt Ut Bt

)minus r2(Y tminus2 Ztminus1Utminus1 Btminus1

)

Then B4 implies that

Dt perp (Xt Y tminus1Ut Utminus1 Bt Btminus1) | Xt Y tminus1

which further implies that Dt perp (Yt minusYtminus1) | Xt Y tminus1by Dawid (1979 Lemmas 41 and42) as (Yt minus Ytminus1) is a function of (Xt Y tminus1Ut Utminus1 Bt Btminus1) Then we have Dt perpYt | Xt Y tminus1 as Ytminus1 is a component of Y tminus1 n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

Page 8: 33 GRANGER CAUSALITY AND STRUCTURAL CAUSALITY IN … Lu_Su... · 2017. 2. 7. · 264 XUNLUETAL. natural experiments. The goal of this paper is to establish the analogous equiv-alence

270 XUN LU ET AL

To keep causal concepts clear it is important to distinguish between the popu-lation DGP defined above and the sample DGP defined later For now we leavesampling aside We may also refer to members of the population as ldquounitsrdquo orldquoindividualsrdquo and the distribution of the population as a mixture distributionMixture distributions appear in many contexts in the literature and arise natu-rally where a statistical population contains two or more subpopulations (see egLindsay 1995 and Mclachlan and Basford 1988) As will be clear in a momentthe sources of population heterogeneity are in general not observed

We interpret Yj as the response of interest It is determined by variables Dj Zj and Uj and the constants bj The variables can be binary categorical orcontinuous Immediately below we formalize a natural notion of causality for thissystem For now it is heuristically appropriate to view Dj as observable causes ofinterest (eg a treatment) Zj as other observable causes not of primary interestand Uj as unobservable causes3

The structural function r is unknown and our goal will be to learn about itfrom a sample of the population Nevertheless when Wj has positive dimensionr embodies the a priori exclusion restriction that Wj does not determine Yj Thetypical sources of this restriction as well as the identities of Dj Zj and Uj theirstatus as observable or not and the priority or precedence relations embodied inthe assumed triangularity are economic theory and specific domain knowledge

The assumed triangularity rules out explicit simultaneity for succinctness andclarity For the causality in simultaneous structural equations there are differentviews in the literature White and Chalak (2009 2013) follow Strotz and Wold(1960) and argue that simultaneous equations are not causal structural relationsbut are instead ldquomutual consistency conditions holding between distinct sets ofstructural equations ndash for example between one set of structural equations govern-ing partial equilibrium and another governing full equilibriumrdquo However otherresearchers argue that simultaneous structural equations can be given a causalinterpretation For example Angrist Graddy and Imbens (2000) provide a poten-tial outcome interpretation for general nonlinear simultaneous equation modelsThey also show that the standard linear instrumental variable estimator identifiesa weighted average of the causal effects

By definition the constants bj are fixed for a given individual Neverthelessthey may vary across individuals we call this variation structural heterogeneityIf the bj rsquos are identical we write them as b0 We assume that bj rsquos are unknownNote that the presence of bj facilitates writing r without a j subscript as differ-ences in the structural relations across population members can be accommodatedby variations in the possibly infinite-dimensional bj

To give a definition of causality for the structural system in Assumption A1 letDj = supp(Dj ) denote the support of Dj ie the smallest closed set containingDj with probability 1

DEFINITION 31 (Cross-Section Structural Causality) Let j be given If thefunction r(middot zu bj ) Dj rarr R

ky is constant on Dj for all admissible z and u

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 271

then Dj does not structurally cause Yj and we write Dj rArrS Yj Otherwise Dj

structurally causes Yj and we write Dj rArrS Yj

Here we implicitly assume that there is variation in potential cause Dj (ie Dj

is nondegenerated) and under the set of counterfactual policies Dj the structuralfunction r is invariant Therefore the invariant function r fully characterizes thecausalstructural relationship between Yj and (Dj Zj Uj )

Structural causality is structural functional dependence in a specific contextSimilar definitions can be given for Zj and Uj but we leave these implicitas these are not the main causes of interest Causality for components of Dj

is defined in an obvious way If we define Yj (d) = r(d Zj Uj bj

)as the

potential outcomes (see eg Rubin 1974 Rosenbaum and Rubin 1983 andHolland 1986) then the equivalent definition of structural noncausality is thatYj (d) = Yj (dlowast) as for all (ddlowast) isin Dj

Structural causality can be easily understood in the familiar linear structure forscalar Yj

Yj = bj0 + Dprimej bj1 + Z prime

j bj2 +Uj

where bj = (bj0bprimej1bprime

j2)prime If bj1 = 0 then Dj does not structurally cause Yj

Otherwise it does This is so natural and intuitive that one might wonder whycausality is such a thorny topic One main reason for confusion arises from therelation between causality and simultaneity as discussed above the triangularsystems considered here obviate this issue Another main reason for confusionis the failure to distinguish carefully between structural equations of the sort writ-ten above and regression equations which may look similar but need not havestructural content The equation above is entirely structural We will encounterregressions (conditional expectations) only after suitable structural foundationsare in place

Observe that heterogeneity in bj rsquos permits Dj to cause Yj for some j and notfor others

The nondegeneracy of Dj ensures that Dj contains at least two points so Dj

is a variable rather than a constant Variation in potential causes is fundamentalfor defining causality (cf Holland 1986) this is what makes possible analogousdefinitions of causality for Zj and Uj Significantly however because bj rsquos arefixed constants they cannot be causes of Yj Instead bj rsquos can be effects or candetermine effects This follows from the following formal definition

DEFINITION 32 (Intervention and Effect) Let j be given and let d and dlowastbe distinct admissible values for Dj Then d rarr dlowast is an intervention to Dj Let(zu) be admissible values for (Zj Uj ) The difference ylowast

j minus yj = r(dlowast zu bj

)minusr(d zu bj

)is the effect on Yj of the intervention d rarr dlowast to Dj at (zu)

This is also referred to as the ldquocausal effectrdquo in the treatment effect literature(see eg Rubin 1974) For the linear case with scalar d the effect of a one-unit

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

272 XUN LU ET AL

intervention d rarr d +1 is

ylowastj minus yj = (

bj0 + (d +1)bj1 + zprimebj2 +u)minus (

bj0 +d bj1 + zprimebj2 +u) = bj1

For unit j this effect is fixed ie it is the same constant for all (zu) Becausebj can differ across units this permits effect heterogeneity

Alternatively if Yj = bj0 + Zj Dprimej bj1 +Uj with scalar Zj then the effect of

d rarr dlowast is

ylowastj minus yj = (

bj0 + zdlowastprimebj1 +u)minus (

bj0 + zd primebj1 +u) = z

(dlowast minusd

)primebj1

Here the effect of Dj on Yj depends on z but not u In this case bj1 determinesthe effect of Dj on Yj together with Zj When an effect depends on a variable(ie an element of Zj or Uj ) it is standard (especially in the epidemiologicalliterature) to call that variable an effect modifier

For simplicity we call ylowastj minus yj the effect of d rarr dlowast Chalak and White (2012)

discuss indirect direct and total effects in structural equations In the potentialoutcome framework Rubin (2004) discusses ldquodirectrdquo and ldquoindirectrdquo casual effectusing principal stratification

We distinguish the structural causal effect from various average causal effectssuch as E[r(dlowast Zj Uj bj ) minus r(d Zj Uj bj )] It is clear that structural non-causality implies zero average causal effects while the converse is not necessarilytrue

To describe identified effects in samples from heterogeneous populationswe define population groups Jg g = 1 as collections of populationunits having identical bj and identical distributions of (Dj Uj ) | X j whereX j = (Z prime

j W primej )

prime We define G = 1 As the population is finite so is thenumber of groups le N We define Ng = j isinJg where middot is the cardinal-ity of the indicated set and let pg = NgN be the proportion of population unitsbelonging to group Jg The bj rsquos by themselves need not define groups as thedistributions of (Dj Uj ) | X j may differ for units with identical bj We call cross-group variation in the distributions of (Dj Uj ) | X j distributional heterogeneityto distinguish this from structural heterogeneity that refers to cross-group varia-tion in bj

Note that because groups are defined by unobservable constants bj and dis-tributions involving the unobservable Uj in general we do not know for surewhether two units belong to the same group Interestingly in the related litera-ture Bester and Hansen (2016) consider estimating grouped effects in panel datamodels when each individualrsquos group identity is known and Su Shi and Phillips(2014) consider identifying latent group structures in panel data models via a vari-ant of Lasso Nevertheless both groups of researchers have only focused on thestructural heterogeneity in the panel framework

Given A1 it follows that (Yj Dj ) | X j is identically distributed for all unitsj in group Jg As a convenient shorthand for j isin Jg we write (Yj Dj ) | X j sim(Yg Dg) | Xg where A sim B denotes that A is distributed as B

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 273

Typically we do not observe an entire population Instead we observe observa-tions sampled from the population in some way For simplicity and concretenesswe consider simple random sampling At the sample level we write4

Y = r (D Z U B)

where B is the randomly sampled bj Note that the randomness of B arises solelyfrom the sampling process We write X = (Z primeW prime)prime and distinguish X from thecauses D by calling X a vector of covariates

For any random vector X we let f (x) and F (x) denote the joint prob-ability density function (PDF) and cumulative distribution function (CDF)respectively For any two random vectors X and Y we use f (y|x) and F (y|x)to denote the conditional PDF and CDF of Y given X = x respectively Wealso use subscript j and g to denote individual j and group g respectivelyFor example Fg(ud x) and fg (u|d x) denotes the CDF of (Ug Dg Xg)and the conditional PDF of Ug given Dg Xg for group g respectively Notethat Fg(ud x) = Nminus1

g

sumjisinJg Fj (ud x) which defines the mixture CDF of

(Ug Dg Xg)

32 Gminuscausality a first encounterWe now define Gminuscausality for cross-section data based on Grangerrsquos philoso-phy of nonpredictability Holland (1986 p 957) considers a special case whereD is randomized and applies Gminuscausality to cross-section data5 As in Hollandwe define Gminusnon-causality as a conditional independent statement FollowingDawid (1979) we write X perp Y | Z to denote that X and Y are independentgiven Z and X perp Y | Z if X and Y are not independent given Z Translat-ing Granger and Newboldrsquos (1986 p 221) definition to the cross-section contextgives

DEFINITION 33 (GminusCausality) Let Y Q and S be random vectors IfQ perp Y | S then Q does not Gminuscause Y with respect to (wrt) S OtherwiseQ Gminuscauses Y wrt S

Here Y Q and S can be any random vectors and this notion has no structuralcontent When S is a constant we have the simplest form of Gminusnon-causalityindependence Correlation is an example of Gminuscausality

To give Gminuscausality structural meaning we impose A1 assume that D obeysa suitable exogeneity condition and take Q = D and S = X The exogeneityassumed for D ensures identification for various measures of its effects and canoften be structurally justified It also turns out to be ideally suited to endowingGminuscausality with structural meaning

To see how this works in a simple setting consider the homogeneous case withD being a binary scalar in which average treatment effects (AT E) and averagetreatment effects on the treated (AT T ) are often discussed (see eg Rubin 1974

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

274 XUN LU ET AL

Hahn 1998 Hirano Imbens and Ridder 2003 and Angrist and Pischke 2009)Define the covariate-conditioned effect of treatment on the treated as

AT TX = E (Y (1)minusY (0) | D = 1 X)

where Y (1) is the potential response to treatment and Y (0) is the potentialresponse in the absence of treatment Using A1 we write Y (1) = r(1 Z U b0)and Y (0) = r(0 Z U b0) AT TX thus has a clear structural interpretation Fromthis we can construct AT T

AT T = E (Y (1)minusY (0) | D = 1) = E(AT TX | D = 1)

Note that here we define AT T and AT TX based on our structural equation r Nevertheless they can be defined without a structural model ie based on thetwo potential outcomes Y (0) and Y (1) (see eg Rubin 1974 and Angrist andPischke 2009)

To identify AT TX it suffices that D perp Y (0) | X Given A1 it suffices forthis that D perp U | X as is readily verified This conditional exogeneity is acommon identifying assumption in cross-sections6 Classical strict exogeneity((D Z) perp U ) is sufficient but not necessary for this Using D perp U | X in thesecond line below we have

AT TX = E [r(1 Z U b0) | D = 1 X ]minus E [r(0 Z U b0) | D = 1 X ]

= E [r(1 Z U b0) | D = 1 X ]minus E [r(0 Z U b0) | D = 0 X ]

= E(Y | D = 1 X)minus E(Y | D = 0 X) = μ(1 X)minusμ(0 X)

Thus AT TX can be expressed in terms of the distribution of observables (hereμ(1 X) and μ(0 X)) so this effect is identified (eg Hurwicz 1950) Simi-larly AT T is identified The identification result based on conditional exogeneityhas been extensively discussed in the treatment effect literature (see eg Rubin1974 Holland 1986 and Angrist and Pischke 2009)

To see the structural meaning for Gminuscausality observe that E(Y | D = 1 X)minusE(Y | D = 0 X) = 0 as is another way to write E(Y | D X) = E(Y | X) asThus in this context Gminusnon-causality in mean of D for Y wrt X is equivalentto AT TX = 0 as

Note that even for binary treatments Gminuscausality in mean does not tell thefull story as Gminuscausality in distribution can hold even when AT TX or AT Tvanishes For example suppose

Y = U D

with D perp U E(U ) = 0 E(U 2) lt infin and E(D2) lt infin Then D structurallycauses Y its effect on Y is U a random effect These effects may be posi-tive or negative however they average out as AT T = 07 Here and as we willshow generally Gminusnon-causality essentially has the interpretation that Y is not

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 275

structurally caused by D under the key conditional exogeneity assumption Inthis example we have Gminuscausality (D perp Y ) as E(Y 2 | D) = D2 E(U 2) TestingGminusnon-causality will detect the structural causality here in contrast to testingAT T = 0 or equivalently Gminusnon-causality in mean of D for Y

When D is nonbinary A1 together with suitable exogeneity conditionssimilarly suffices to identify certain effects that give structural meaning toGminuscausality This makes it generally possible to test for structural causality bytesting for Gminuscausality

With heterogeneity matters become somewhat more complicated In particularin the binary case μ(1 X)minusμ(0 X) is no longer AT TX Instead with suitableexogeneity (Assumption A2 below) we have that for each x μ(1 x)minusμ(0 x)recovers a blended effect

μ(1 x)minusμ(0 x) =sumg=1

ζg|x AT Tg|x

where the ζg|x rsquos are nonnegative weights adding to one and for all j isin JgAT Tg|x = E[Yj (1) minus Yj (0) | Dj = 1 X j = x] is the group-specific covariate-conditioned average effect of treatment on the treated Here Yj (1) is the poten-tial response to treatment and Yj (0) is the potential response in the absence oftreatment

33 Exogeneity and effect identification with heterogeneity

With heterogeneity an identifying exogeneity condition relevant for analyzingGminuscausality is

Assumption A2 (Heterogeneous Cross-Section Exogeneity) For all g isin G(i) Dg | Xg sim D | X and (i i) Dg perp Ug | Xg

Observe that this imposes structures for all groups namely all g isin G Althoughthis condition is not the weakest possible (see Chalak and White (2012) and thediscussion below) A2 generally suffices to identify effects and link Gminuscausalityand structural causality A2(i) restricts the allowed distributional heterogeneityfor all g isin G the conditional distributions of Dg given Xg are assumed identicalA2(i i) imposes conditional exogeneity for all groups Below we discuss thisfurther

THEOREM 31 Suppose A1 - A2(i) hold If A2(i i) holds then D perp U | X

It is also of interest to know whether the converse holds A strict conversedoes not hold as certain fortuitous cancellations in population-group condi-tional distributions can yield sample conditional exogeneity without population-group conditional exogeneity Nevertheless the converse does hold under a mildregularity condition ruling out exceptional cases We introduce the followingdefinition

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

276 XUN LU ET AL

DEFINITION 34 Ug | Dg Xg g isin G is regular ifsumgisinG

pg fg(u | d x) fg(x) =sumgisinG

pg fg(u | x) fg(x) for all (ud x) (32)

implies Ug perp Dg | Xg for all g isin GWhen G = 1 Ug | Dg Xg g isin G is necessarily regular To understand whatregularity rules out consider the next simplest case with G = 2 and Xg absentThen Ug | Dg g isin G fails to be regular if and only if U2 perp D2 (say) and

f1(u | d) = f1(u)minus 1minus p1

p1[ f2(u | d)minus f2(u)] for all (ud)

where the two groups have been re-indexed for convenience Suppose that p1f1(u) f2(u) and f2(u | d) are arbitrary Then it is easily arranged that f1(u | d)can be negative for (ud) in a set of positive probability so f1(u | d) does notdefine a conditional density and Ug | Dg g isin G is regular after all If this f1(u |d) is nevertheless a conditional density it is clearly highly special as it ensuresthat eq(32) holds for the given p1 and the functions defined by f1(u) f2(u) andf2(u | d) (all of which are typically unknown) but not necessarily otherwise

In the general case Ug | Dg Xg g isin G fails to be regular if and only ifUg perp Dg | Xg for some g isin G and

f1(u | d x) = f1(u | x)minus (p1 f1(x))minus1sumg=2

pg [ fg(u | d x) minus fg(u | x)] fg(x)

times for all (ud x)

where we again re-index the groups As before this is clearly a very special pop-ulation configuration ruling out such cases by imposing regularity is a weak re-striction

The converse result is

THEOREM 32 Suppose A1 - A2(i) hold and suppose Ug | Dg Xgg isin Gis regular If D perp U | X then A2(i i) holds

Theorems 31-32 also hold with Yg | Dg Xg g isinG regular D perp U | X replacedby D perp Y | X and A2(i i) replaced by Dg perp Yg | Xg g isin G This result plays acentral role in linking structural causality and Gminuscausality

34 Linking structural causality and GminuscausalityAs we have seen A1 imposes structures where causal effects are well definedA2 permits recovering versions of these As we now prove this gives structuralmeaning to Gminuscausality generally Recall that the bj rsquos are identical in group Jgthus the same structural causality relations hold for all j in a given group Wewrite Dg rArrS Yg when Dj rArrS Yj for (all) j in group Jg Our next result showsthat structural noncausality implies Gminusnon-causality

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 277

PROPOSITION 33 Let A1 - A2 hold Then structural noncausality(Dg rArrS Yg forallg isinG) implies that D does not Gminuscause Y wrt X ie D perp Y | X

This result is intuitive when structural effects are well defined and identifiableGminusnon-causality follows from structural noncausality Holland (1986 p 958)gives a similar result for the random experiment case Our Proposition 33 jus-tifies using tests of Gminusnoncausality to test structural noncausality if we rejectGminusnoncausality we must reject structural noncausality

As WL show structural causality does not imply Gminuscausality Without furtherassumptions the concepts are not equivalent Nevertheless if as is commonlyassumed in the literature (see WL for discussion) r obeys separability betweenobservable causes D and unobservable causes U or r obeys a specific form ofmonotonicity in U then with suitable regularity structural causality does implyGminuscausality We have

THEOREM 34 Given A1 - A2 suppose Yg | Dg Xgg isin G is regularSuppose further that for all g isin G either (i) or (i i) holds

(i) For all j isin Jg for unknown measurable functions r1 and r2

Yj = r1(Dj Zj bj

)+ r2(Zj Uj bj

) (33)

(ii) For all j isinJg for = 1 ky r(d zu bj ) = r0(d zu bj ) for scalaru where r0(d z middot bj ) is strictly monotone increasing for each admissi-ble (d z) and Fj(y | d x) = P[Yj le y | Dj = d X j = x] is strictlymonotone in y for each admissible (d x)

Then structural causality (for some g isin G Dj rArrS Yj for (all) j in Jg) impliesthat D Gminuscauses Y wrt X ie D perp Y | X

Kasy (2011) gives a discussion of structures satisfying (i i) Even without sep-arability or monotonicity there is an equivalence between Gminuscausality and astronger notion of structural causality that handles certain exceptional cases wherethe causal structure and the conditional distribution of (Uj Dj ) given X j inter-act in just the right way to hide the structural causality WL call this strongernotion structural causality with positive probability and provide discussions Welet Yj = supp(Yj ) and Xj = supp(X j )

DEFINITION 35 (Structural Causality with Positive Probability) SupposeA1 holds and let j be given Suppose that for each y isin Yj there exists a mea-surable function f jy Xj rarr [01] such thatint

1r(Dj Zj u bj

)lt y

d Fj

(u | X j

) = f jy(X j

)as (34)

where Fj (u | X j ) = E[1Uj le u | X j ] Then Dj does not structurally cause Yj

almost surely (as) wrt X j (Dj rArrS(X j ) Yj ) Otherwise Dj structurally causesYj with positive probability (wpp) wrt X j

(Dj rArrS(X j ) Yj

)

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

278 XUN LU ET AL

By the definition of a group the same structural causality wpp relations holdfor all j in a given group We thus write Dg rArrS(Xg) Yg when Dj rArrS(X j ) Yj holdsfor all j in group Jg

Below is an example in WL in which Dj structurally causes Yj while Dj doesnot structurally cause Yj as Consider the structural equation r

Yj = r(Dj Uj

) = DjradicD2

j +1Uj1 + 1radic

D2j +1

Uj2

where Uj equiv (Uj1Uj2) and Dj are all N (01) random variables and Dj U1 j andU2 j are mutually independent For simplicity there is no bj Zj or X j It is clearthat Dj structurally causes Yj here Nevertheless Dj does not structurally causeYj as To see this note that Yj and Dj are independent Thus the LHS of eq(34)becomesint

1r(Dj u1u2

)lt y

d Fj (u1u2)

=int

1r(Dj u1u2

)lt y

d Fj

(u1u2|Dj

) = Pr[Yj lt y|Dj

] = (y)

where Fj (middot) and Fj(middot|Dj

)denote the CDF of Uj and the conditional CDF of Uj

given Dj and is the standard normal CDF Thus the RHS does not depend onDj ie Dj does not structurally cause Yj as

Theorem 35 below gives a result linking Gminuscausality and structural causalityas

THEOREM 35 Suppose A1 - A2 hold Suppose also that Yg | Dg Xgg isinG is regular If Dj structurally causes Yj wpp wrt X j (for some g isin GDj rArrS(X j ) Yj for all j in Jg) then D Gminuscauses Y wrt X ie D perp Y | X

Together Proposition 33 and Theorem 35 establish that cross-sectionGminuscausality and structural causality are essentially equivalent under the key con-ditional exogeneity assumption We say ldquoessentiallyrdquo as Theorem 35 rules outthe two ways that Gminusnon-causality can mask structural causality The first arisesfrom subtle interactions between the causal structure and the conditional distribu-tion of (Uj Dj ) given X j The second arises under heterogeneity from delicatecancellations among the conditional distributions of Yj given Dj and X j acrossgroups These exceptional possibilities only trivially mitigate the power of testsfor Gminuscausality as tests for structural causality Errors of interpretation and in-ference need not result as long as these exceptions are recognized

35 Identification and identification failure

To gain further insight into the relation between Gminuscausality and effect identi-fication we now undertake a deeper analysis of Gminuscausality in mean For this

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 279

let qg(d x) = fg(d x)sumhisinG (ph fh(d x)) Under A1 the law of iterated ex-

pectations gives

E (Y | D = d X = x) =sumgisinG

pg qg(d x)

intr(d zu bg) d Fg(u | d x)

For concreteness consider identifying average marginal effects for a continu-ous treatment d To allow both discrete and continuous Ug let νg(u | d x) definea σminusfinite measure dominating that defined by Fg(u | d x) and let fg(u | d x)define the associated Radon-Nikodym density for Ug given (Dg Xg) Assum-ing differentiability and mild regularity derivations parallel to those of White andChalak (2013) give

part

partdE(Y | D = d X = x)

=sumgisinG

pg qg(d x)

intpart

partdr(d zu bg) fg(u | d x) dνg(u | d x)

+sumgisinG

pg qg(d x)

int[r(d zu bg)

part

partdln fg(u | d x)]

times fg(u | d x) dνg(u | d x)

+sumgisinG

pgpart

partdqg(d x)

intr(d zu bg) fg(u | d x)dνg(u | d x)

After some manipulation this regression derivative can be written as

part

partdE (Y | D = d X = x) = β(d x)+ δ1(d x)+ δ2(d x) (35)

The first of the three terms on the right is a weighted average marginal effect

β(d x) =sumgisinG

pg qg(d x)βg(d x) where βg(d x)

= E[(partpartd)r(Dg ZgUg bg) | Dg = d Xg = x

]is the covariate-conditioned average marginal effect of Dj on Yj at (Dj = d X j =x) for j isin Jg g isin G Thus when the other two terms vanish the regressionderivative identifies the structurally meaningful weighted effect β(d x) andwhen β(d x) is nonzero for (d x) in a set of positive probability we haveGminuscausality in mean of D for Y wrt X When the other two terms do notvanish we generally have identification failure and there is no necessary linkbetween Gminuscausality and structural causality

The second term is a pure endogeneity bias

δ1(d x) =sumgisinG

pg qg(d x) E(εg Sg | Dg = d Xg = x

)

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

280 XUN LU ET AL

where εg = Ygminus E(Yg | Dg = d Xg = x

)is the regression residual for the given

group and Sg = (partpartd) ln fg(Ug | Dg Xg) is the exogeneity score of White andChalak (2013) Observe that E(εg Sg | Dg = d Xg = x) represents a specificform of omitted variable bias where the exogeneity score is the omitted variableA sufficient (but not necessary) condition for this bias to vanish is A2(i i) Dg perpUg | Xg for all g isin G

The third term is a pure heterogeneity bias

δ2(d x) =sumgisinG

pgpart

partdqg(d x) E

(Yg | Dg = d Xg = x

)

due to heterogeneity of fg(d | x) across members of G8 A sufficient (but notnecessary) condition for this to vanish is A2(i) Dg | Xg sim D | X for all g isin G

Thus A2 ensures that the regression derivative identifies the weighted effect

β(d x) = βlowast(d x) =sumgisinG

pg qlowastg(x)βlowast

g(d x)

where qlowastg(x) = fg(x)

sumh (ph fh(x)) and βlowast

g(d x) = int(partpartd)r(d zu bg)

d Fg(u | x) This identification provides the link between Gminuscausality in meanand structural causality

Although A2 is not necessary for effect identification cases where identifica-tion holds in the absence of A2 are quite special analogous to failures of reg-ularity Thus for practical purposes A2 can be viewed as playing the key rolein identifying effects of interest and thereby linking Gminuscausality and structuralcausality

Further as Proposition 61 and Corollary 62 of WL show in the absence ofstructural causality Gminuscausality is essentially equivalent to exogeneity failureHere this is reflected in the fact that when βg(d x) vanishes for all g A1 andmild regularity conditions give

part

partdE (Y | D = d X = x) = δ1(d x)+ δ2(d x)

Thus with structural noncausality as Gminuscausality in mean implies the fail-ure of A2(i) A2(i i) or both conversely failure of A2 essentially ensuresGminuscausality in mean

36 Testing for Gminuscausality and structural causality in cross-sectiondata

As just seen the hypothesis of Gminusnon-causality and thus of structural noncausal-ity under the conditional exogeneity assumption is a specific conditional inde-pendence In the literature there are many conditional independence tests thatapply to IID data (eg see Delgado and Gonzalez-Manteiga 2001 Fernandes and

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 281

Flores 2001 Su and White 2007 2008 2014 Song 2009 Linton and Gozalo2014 Huang Sun and White 2016) These methods can be computationallychallenging as they are nonparametric Angrist and Kuersteiner (2004 2011)develop a semi-parametric test for conditional independence Based on the factthat Y perp D | X implies ψ1(Y ) perp ψ2(D) | X for any measurable vector functions(ψ1ψ2) WL propose several convenient regression-based tests

4 STRUCTURAL CAUSALITY AND GminusCAUSALITY IN PANEL DATAThe literature contains considerable discussion about Gminuscausality in the paneldata setting Nevertheless the focus is mainly on linear or parametric modelsFor example Chamberlain (1984) discusses Gminuscausality ldquoconditional on un-observables [lsquofixed effectrsquo]rdquo in a linear model and a logit model Holtz-EakinNewey and Rosen (1988) consider a traditional linear vector autoregression(VAR) and use GMM to test Gminuscausality Nair-Reichert and Weinhold (2001)discuss Gminuscausality in a dynamic mixed fixed- and random-coefficients linearmodel allowing heterogeneity across individuals Dumitrescu and Hurlin (2012)test for Gminuscausality in a linear dynamic panel model with fixed coefficients thatvary across individuals There is also a growing literature on nonlinear and non-separable panel models (see eg Hoderlein and White (2012) and the referencestherein) We are not aware that Gminuscausality has been discussed in such modelsso far

We emphasize that in panel data Gminuscausality is also entirely based on pre-dictability and has no causal interpretation In this section we link Gminuscausalityand structural causality in general panel structures Since the static panel case issimilar to the cross-sectional case we focus on a dynamic data generating processTo simplify notation for given y isin N+ we let Y jtminus1 denote the finite historyY jtminus1 = (Yjtminusy Yjtminus1)

Assumption B1 (Panel Population DGP) Let (F P) be a complete prob-ability space let N isin N+ and let y isin N+ For the members of a populationj isin 1 N let the random vectors Yjt t = 12 be structurally determinedby a triangular system as

Yjt = r(Y jtminus1 Djt Zjt Ujt bjt

) (41)

where r is an unknown measurable ky times 1 function ky isin N+ Yjτ (τ = 1 minusy 0) Djt (kd times1kd isin N+) Zjt (kz times1 kz isin N) and Ujt (ku times1 ku isin N)are vectors of nondegenerate random variables on (F P) and bjt is a non-random real vector of dimension kb isin N Suppose also that Wjt t = 12 arerandom vectors on (F P) with dimension kw isin N The triangular structureis such that for t = 12 neither Yjsinfins=1minusy

nor Djsinfins=1 structurally deter-mine Wjt Yjsinfins=1minusy

does not structurally determine Djt Zjt or Ujt andDjsinfins=1 does not structurally determine Zjt or Ujt

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

282 XUN LU ET AL

The interpretation of structural eq(41) is the same as that for eq(31) ex-cept that here we have multiple time periods t = 12 The elements of(Yjt Djt Zjt Wjt Ujt

)and bjt rsquos can contain both time-varying and time-

invariant elements For now we do not distinguish these The elements of(Djt Zjt Wjt Ujt ) can contain lags of underlying variables By permittingonly finite histories Y jtminus1 Djt Zjt we restrict attention to Markov-type datagenerating processes this greatly simplifies the analysis and corresponds to thestructures usually considered in practice

As in the cross-section case we can have both structural heterogeneity anddistributional heterogeneity We specify the relevant structural and distributionalheterogeneities below

We again impose random sampling from the population For simplicity weassume all time periods can be observed for every individual so we observe abalanced panel Similar to the cross-section case we write

Yt = r (Y tminus1 Dt Zt Ut Bt ) t = 1 T

by suppressing the cross-sectional subscript i We assume that we observe data(Yt Dt Zt Wt ) t = 1 T and relevant lags prior to t = 1

41 Linking structural causality and Gminuscausality in dynamic panelsWe first define structural causality for the dynamic panel structure Let Djt be thesupport of Djt

DEFINITION 41 (Structural Causality in Dynamic Panels) Let j and t begiven If the function r(y jtminus1 middot zjt ujt bjt ) Djt rarr R

ky is constant on Djt

for all admissible y jtminus1 zjt and ujt then Djt does not structurally cause Yjt and we write Djt rArrS Yjt Otherwise Djt structurally causes Yjt and we writeDjt rArrS Yjt

We let X jt = (Z primejt W prime

jt )prime denote the covariates at time t WL consider

retrospective conditional exogeneity and retrospective Gminuscausality (White andKennedy 2009) We also allow this For this we letX jt denote a covariate his-tory that may contain X jt and lags or leads of X jt as in Wooldridge (2005 p 41)or WL ie for some m isin N+

X jt = SX(X jtminusm X jt X jt+m

) t = 12

where SX is a given selection matrix We assume

X jt X jtminusm subeX jt and

allow X jt also to contain leads of X jt When Xjt contains leads we have theretrospective case as in WL We denote randomly sampled covariates X t Forsimplicity we also assume that all needed elements of X t are observable Tem-poral precedence plays an important role in panel data In particular comparedwith the cross-sectional case we now have a richer set of covariates (includingboth leads and lags) to choose from

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 283

To provide a link between structural causality and Gminuscausality for dynamicpanels we condition on not only covariates X t but also lags of Yt For thiswe denote Y jtminus1 = (Yjtminusmy Yjtminus1) where my isin N+ and my ge y ie

Y jtminus1 sube Y jtminus1 Similarly we let Y tminus1 be the randomly sampled Y jtminus1Similar to the cross-section case for each t we define population group Jgt

g isin 1 t as a collection of population units having identical bjt and iden-tical distribution of (Ujt Djt ) |X jt Y jtminus1 We also let Gt = 1 t Forgiven t we write Dgt rArrS Ygt to denote Djt rArrS Yjt for all j in group Jgt g isin Gt

Next we impose an exogeneity assumption analogous to A2

Assumption B2 (Heterogeneous Dynamic Panel Exogeneity) Let t be givenFor all g isin Gt (i) Dgt |Xgt Y gtminus1 sim Dt |X t Y tminus1 and (i i) Dgt perp Ugt |Xgt Y gtminus1

Here as before for all g isin Gt the conditional distributions of Dgt given(Xgt Y gtminus1) are assumed to be identical and we write the common distributionas Dt |X t Y tminus1

The analog of Theorem 31 relating population and sample conditional exo-geneity is

THEOREM 41 Suppose B1ndashB2(i) hold and let t be given If B2(i i) holdsthen Dt perp Ut |X t Y tminus1

Proposition 42 provides a link between structural noncausality and Gminusnon-causality

PROPOSITION 42 Suppose that B1 and B2 hold Let t be given Thenstructural noncausality (Dgt rArrS Ygt forallg isin Gt ) implies that Dt does notGminuscause Yt wrtX t Y tminus1 ie Dt perp Yt |X t Y tminus1

We can also define structural causality with positive probability for dynamicstructures and show that structural causality and Gminuscausality are ldquoessentiallyrdquoequivalent under the key conditional exogeneity assumption For brevity we omitthe details

So far we have not carefully distinguished between time-varying and time-invariant elements of unobservable bjt and Ujt Below we briefly discuss thecase where we allow time-invariant components of Ujt and the randomly sampledbjt to be arbitrarily dependent or correlated with the cause of interest Djt For this we denote Ujt equiv (Ujt Uj0) where Ujt is time varying and Uj0time-invariant Similarly we denote bjt equiv (bjt bj0) The randomly sampledUjt Uj0 bjt bj0 are denoted as Ut U0 Bt B0 respectively In princi-ple we want to remove the time-invariant Uj0 and bj0 using eg the firstdifference For this it is convenient to impose a separable assumption on thestructural equation9 We also impose a conditional exogeneity assumption at thesample level

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

284 XUN LU ET AL

Assumption B3 Suppose that B1 holds with

Yjt = r1(Djt Uj0 bj0

)+r2(Y jtminus1 Djt Zjt Ujt bjt

) j = 1 N t = 1

where r1 and r2 are two unknown ky times1 measurable functions

Assumption B4 Suppose that for given t Dt perp (Ut Utminus1 Bt Btminus1

) |Xt Y tminus1

B4 is a sample-level assumption which can be supported by a correspond-ing assumption at the heterogeneous population level Note that in general B4is a strong assumption as we condition on Y tminus1 which is a function of bothDtminus1 and Utminus1 Nevertheless this assumption is plausible under the null of struc-tural noncausality as Y jtminus1 is a constant function of Djtminus1 under the nullOne simple example is that bjt rsquos are constants over j and t and Djt infint=1 perpYjt 0

t=1minusyUjt infint=1X jt infint=1 Under the null of structural noncausality for

all t Y jtminus1 is a function of Yjs0s=1minusy

Ujsts=1X jst

s=1 Then it is easyto show that in this case B4 is satisfied Certainly it will be interesting to relaxB4 but we leave this for future research

PROPOSITION 43 Suppose that B1ndashB4 hold Suppose that Dgt rArrS Ygtand Dgtminus1 rArrS Ygtminus1 forallg isin Gt cupGtminus1 Then Dt perp Yt | Xt Y tminus1

Proposition 43 suggests that when there are time-invariant components in theunobservables we can test for structural noncausality by testing for Gminusnon-causality Dt perp Yt | Xt Y tminus1 under the conditional exogeneity assumption B4

42 Testing for Gminuscausality and structural causality in panel dataAs shown above we can perform a Gminusnon causality test to test for structuralnoncausality Testing Gminusnon-causality in panel data is also simply a conditionalindependence test Here we focus on testing Dt perp Yt | Xt Y tminus1

First assume that the joint distributions of (Dt Yt Xt ) are identical over t In this case we can pool the data and implement a conditional independence testas discussed in Section 36

Second suppose that the joint distributions of (Dt Yt Xt ) are different overtime t In this case for each time period t using the cross-section data we canimplement a conditional independence test for Dt perp Yt | (Xt Y tminus1) and obtain atest statistic sayWt Thus we have T test statistics W1W2 WT and eachone can be used to test for structural noncausality for each t We may also want totest the hypothesis that for all t Djt does not structurally cause Yjt For this wecan construct our test statistic by taking the average of the T test statistics ieW equiv 1

T

sumTt=1Wt as in Im Pesaran and Shin (2003) and Dumitrescu and Hurlin

(2012) Other forms of ldquoaveragesrdquo are possible and special care is needed to takeinto account the dependence structure over time

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 285

5 CONCLUSION

This paper provides direct links between Granger causality and structural causal-ity in cross-section and panel data We extend Granger causality to cross-section and panel data and give a natural definition of structural causality inheterogeneous populations We show that under the key conditional exogeneityassumption Granger causality is essentially equivalent to structural causality incross-section and panel data Similar to the results in White and Lu (2010) fortime-series data our results here should enable researchers to avoid the misuse ofGranger causality and to establish the desired structural causal relation

NOTES

1 Strictly speaking if the conditional distributions are parameterized or treated as infinite-dimensional parameters then distributional heterogeneity may also be thought of as a special caseof structural heterogeneity

2 The classification is certainly overly simplistic We ignore the early literature on the philosophyof causality (for a review see eg Holland 1986 and Hoover 2008) We also ignore the litera-ture on causality discussed in other disciplines notably in machine learning (see eg Pearl 20092015 White and Chalak 2009 and White Chalak and Lu 2011) In particular Heckman and Pinto(2015) compare Haavelmorsquos structural framework of causality with the framework of Directed AcyclicGraphs (DAG) studied in the Bayesian network

3 Although Uj is typically called ldquounobservablerdquo it is better to view its elements as variables thatwill be omitted from empirical analysis This may be because they are unobservable it may also bebecause the researcher has purposefully or inadvertantly neglected them

4 For notational simplicity we suppress the observation subscript i5 Using our notation Holland considers the case where D perp (XU ) which implies our condi-

tional exogeneity assumption D perp U | X below6 See eg Altonji and Matzkin (2005) Hoderlein and Mammen (2007 2009) Imbens and Newey

(2009) White and Lu (2011) and White and Chalak (2013)7 AT T = E(Y (1)minusY (0) | D = 1) = E(U minus0 | D = 1) = E(U ) = 0

8 The conditional density fg(d | x) can also be interpreted as Imbensrsquos (2000) generalized propen-sity score

9 There is a recent literature which allows time-invariant unobservables to be arbitrarily correlatedwith causes of interest in the general nonseparable models (see eg Evdokimov 2010) We leave theproblem of linking G-causality and structural causality in this general setting for future research

REFERENCES

Altonji J amp R Matzkin (2005) Cross section and panel data estimators for nonseparable models withendogenous regressors Econometrica 73 1053ndash1102

Angrist J K Graddy amp G Imbens (2000) The interpretation of instrumental variables estimatorsin simultaneous equations models with an application to the demand for fish Review of EconomicStudies 67 499ndash527

Angrist J O Jorda amp G Kuersteiner (2013) Semiparametric estimates of monetary policy effectsstring theory revisited NBER Working Paper 19355

Angrist J amp G Kuersteiner (2004) Semiparametric causality tests using the policy propensity scoreNBER Working Paper No 10975

Angrist J amp G Kuersteiner (2011) Causal effects of monetary shocks semiparametric conditionalindependence tests with a multinomial propensity score Review of Economics and Statistics 93725ndash747

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

286 XUN LU ET AL

Angrist J amp J Pischke (2009) Mostly Harmless Econometrics An Empiricistrsquos Companion PrincetonUniversity Press

Bester CA amp CB Hansen (2016) Grouped effects estimators in fixed effects models Journal ofEconometrics 190 197ndash208

Bonhomme S amp E Manresa (2015) Grouped patterns of heterogeneity in panel data Econometrica83 1147ndash1184

Browning M amp JM Carro (2007) Heterogeneity and microeconometrics modelling In R BlundellW K Newey and T Persson (Eds) Advances in Economics and Econometrics Theory andApplications Ninth World Congress of the Econometric Society Volume 3 pp 45ndash74 CambridgeUniversity Press

Browning M amp JM Carro (2010) Heterogeneity in dynamic discrete choice models EconometricsJournal 13 1ndash39

Burda M M Harding amp J Hausman (2015) A Bayesian semi-parametric competing risk model withunobserved heterogeneity Journal of Applied Econometrics 30 353ndash376

Chalak K amp H White (2011) An extended class of instrumental variables for the estimation of causaleffects Canadian Journal of Economics 44 1ndash51

Chalak K amp H White (2012) Causality conditional independence and graphical separation insettable systems Neural Computation 24 1611ndash1668

Chamberlain G (1982) The general equivalence of Granger and Sims causality Econometrica 50569ndash581

Chamberlain G (1984) Panel data In Z Griliches amp M Intriligator (eds) Handbook of Economet-rics vol2 ch 22 pp1247ndash1318 Elsevier

Chesher A (2003) Identification in nonseparable models Econometrica 71 1405ndash1441Chesher A (2005) Nonparametric identification under discrete variation Econometrica 73

1525ndash1550Dawid AP (1979) Conditional independence in statistical theory Journal of the Royal Statistical

Society Series B 41 1ndash31Deb P amp PK Trivedi (2013) Finite mixture for panels with fixed effects Journal of Econometric

Methods 2 35ndash51Delgado M amp W Gonzalez-Manteiga (2001) Significance testing in nonparametric regression based

on the bootstrap Annals of Statistics 29 1469ndash1507Dumitrescu E-I amp C Hurlin (2012) Testing for Granger non-causality in heterogeneous panels

Economic Modelling 29 1450ndash1460Evdokimov K (2010) Identification and estimation of a nonparametric panel data model with unob-

served heterogeneity Working paper Department of Economics Princeton UniversityFlorens JP (2003) Some technical issues in defining causality Journal of Econometrics 112

127ndash128Florens JP amp D Fougere (1996) Non-causality in continuous time Econometrica 64 1195ndash1212Florens JP amp M Mouchart (1982) A note on non-causality Econometrica 50 583ndash591Granger CWJ (1969) Investigating causal relations by econometric models and cross-spectral

methods Econometrica 37 424ndash438Granger CWJ (1980) Testing for causality a personal viewpoint Journal of Economic Dynamics

and Control 2 329ndash352Granger CWJ (1988) Some recent developments in a concept of causality Journal of Econometrics

39 199ndash211Granger CWJ amp P Newbold (1986) Forecasting Economic Time Series Academic Press 2nd

editionHausman J amp T Woutersen (2014) Estimating a semi-parametric duration model without specifying

heterogeneity Journal of Econometrics 178 114ndash131Haavelmo T (1943) The statistical implications of a system of simultaneous equations Econometrica

11 1ndash12Haavelmo T (1944) The probability approach in econometrics Econometrica 12 (supp) iii-vi

1ndash115

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 287

Hahn J (1998) On the role of the propensity score in efficient semiparametric estimation of averagetreatment effect Econometrica 66 315ndash331

Hahn J amp R Moon (2010) Panel data models with finite number of multiple equilibria EconometricTheory 26 863ndash881

Heckman JJ (2000) Causal parameters and policy analysis in economics a twentieth century retro-spective Quarterly Journal of Economics 115 45ndash97

Heckman JJ (2008) Econometric causality International Statistical Review 76 1ndash27Heckman JJ amp R Pinto (2015) Causal analysis after Haavelmo Econometric Theory 31

115ndash151Hirano K G Imbens amp G Ridder (2003) Efficient estimation of average treatment effects using the

estimated propensity score Econometrica 71 1161ndash1189Hoderlein S amp E Mammen (2007) Identification of marginal effects in nonseparable models without

monotonicity Econometrica 75 1513ndash1519Hoderlein S amp E Mammen (2009) Identification and estimation of local average derivatives in non-

separable models without monotonicity Econometrics Journal 12 1ndash25Hoderlein S amp H White (2012) Nonparametric identification in nonseparable panel data models with

generalized fixed effects Journal of Econometrics 168 300ndash314Holland P (1986) Statistics and causal inference Journal of the American Statistical Association 81

945ndash970Holtz-Eakin D W Newey amp HS Rosen (1988) Estimating vector autoregressions with panel data

Econometrica 56 1371ndash1396Hoover K (2008) Causality in economics and econometrics In S Durlauf and L Blume (eds)

The New Palgrave Dictionary of Economics 2nd edition Palgrave MacmillanHsiao C (2014) Analysis of Panel Data Cambridge University PressHuang M Y Sun amp H White (2016) A flexible nonparametric test for conditional independence

Econometric Theory forthcomingHurwicz L (1950) Generalization of the concept of identification In TC Koopmans (eds) Statistical

Inference in Dynamic Economic Models pp 238ndash257 John WileyIm K H Pesaran amp Y Shin (2003) Testing for unit roots in heterogeneous panels Journal of Econo-

metrics 115 53ndash74Imbens G (2000) The role of the propensity score in estimating dose-response functions Biometrika

87 706ndash710Imbens G (2004) Nonparametric estimation of average treatment effects under exogeneity a review

Review of Economics and Statistics 86 4ndash29Imbens G amp W Newey (2009) Identification and estimation of triangular simultaneous equations

models without additivity Econometrica 77 1481ndash1512Imbens G amp J Wooldridge (2009) Recent developments in the econometrics of program evaluation

Journal of Economic Literature 47 5ndash86Kasy M (2011) Identification in triangular systems using control functions Econometric Theory 27

663ndash671Koopmans T (1950) Statistical Inference in Dynamic Economic Models Cowles Commission Mono-

graph No 10 WileyKuersteiner G (2008) GrangerndashSims causality In S Durlauf and L Blume (Eds) The New Palgrave

Dictionary of Economics 2nd edition Palgrave MacmillanLechner M (2001) Identification and estimation of causal effect of multiple treatments under con-

ditional independence assumption In M Lechner F Pfeiller (eds) Econometric Evaluations ofLabour Market Policies Hildelberg Physical 43ndash58

Lechner M (2011) The relation of different concepts of causality used in time series and microecono-metrics Econometric Reviews 30 109ndash127

Lin C-C amp S Ng (2012) Estimation of panel data models with parameter heterogeneity when groupmembership is unknown Journal of Econometric Methods 1 42ndash55

Lindsay B (1995) Mixture Models Theory Geometry and Applications Hayward Institute for Math-ematical Statistics

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

288 XUN LU ET AL

Linton O amp P Gozalo (2014) Testing conditional independence restrictions Econometric Reviews33 523ndash552

Lu X amp L Su (2014) Determining the number of groups in latent panel structures Working paperSchool of Economics Singapore Management University

Matzkin R (2003) Nonparametric estimation of nonadditive random functions Econometrica 711339ndash1375

Matzkin R (2007) Nonparametric identification In JJ Heckman and EE Leamer (eds) The Hand-book of Econometrics 6B North-Holland

Mclachlan G amp K Basford (1988) Mixture Models Inference and Applications to Clustering MarcelDekker

Nair-Reichert U amp D Weinhold (2001) Causality tests for cross-country panels a new look at FDIand economic growth in developing countries Oxford Bulletin of Economics and Statistics 63153-171

Pearl J (2009) Causality Models Reasoning and Inference (2nd edition) Cambridge UniversityPress

Pearl J (2015) Trygve Haavelmo and the emergence of causal calculus Econometric Theory 31152ndash179

Rosenbaum P amp D Rubin (1983) The central role of the propensity score in observational studies forcausal effects Biometrika 70 41ndash55

Rubin D (1974) Estimating causal effects of treatments in randomized and non-randomized studiesJournal of Educational Psychology 66 688ndash701

Rubin D (2004) Direct and indirect causal effects via potential outcomes Scandinavian Journal ofStatistics 31 161ndash170

Sarafidis V amp N Weber (2015) A partially heterogeneous framework for analyzing panel data OxfordBulletin of Economics and Statistics 77 274ndash296

Sims C (1972) Money income and causality American Economic Review 62 540ndash552Song K (2009) Testing conditional independence via Rosenblatt transforms Annals of Statistics 37

4011ndash4015Strotz R amp H Wold (1960) Recursive vs nonrecursive systems An attempt at synthesis Economet-

rica 28 417ndash427Su L Z Shi amp PCB Phillips (2014) Identifying latent structures in panel data Working paper

Department of Economics Yale UniversitySu L amp H White (2007) A Consistent characteristic function-based test for conditional indepen-

dence Journal of Econometrics 141 807ndash834Su L amp H White (2008) A nonparametric Hellinger metric test for conditional independence Econo-

metric Theory 24 829ndash864Su L amp H White (2014) Testing conditional independence via empirical likelihood Journal of

Econometrics 182 27ndash44Sun Y (2005) Estimation and inference in panel structure models Working paper Department of

Economics UCSDWhite H amp K Chalak (2009) Settable systems An extension of Pearlrsquos causal model with optimiza-

tion equilibrium and learning Journal of Machine Learning Research 10 1759ndash1799White H amp K Chalak (2010) Testing a conditional form of exogeneity Economics Letters 109

88ndash90White H amp K Chalak (2013) Identification and identification failure for treatment effects using

structural systems Econometric Reviews 32 273ndash317White H K Chalak amp X Lu (2011) Linking Granger causality and the Pearl causal model with

settable systems Journal of Machine Learning Research Workshop and Conference Proceedings12 1ndash29

White H amp P Kennedy (2009) Retrospective estimation of causal effects through time In J Castleamp N Shephard (eds) The Methodology and Practice of Econometrics A Festschrift in Honour ofDavid F Hendry pp 59ndash87 Oxford University Press

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 289

White H amp X Lu (2010) Granger causality and dynamic structural systems Journal of FinancialEconometrics 8 193ndash243

White H amp X Lu (2011) Causal diagrams for treatment effect estimation with application to selectionof efficient covariates Review of Economics and Statistics 93 1453ndash1459

Wooldridge J (2005) Simple solutions to the initial conditions problem for dynamic nonlinear paneldata models with unobserved heterogeneity Journal of Applied Econometrics 20 39ndash54

Zellner A (1979) Causality and econometrics In K Brunner and AH Meltzer (eds) Three Aspectsof Policy and Policymaking Carnegie-Rochester Conference Series Vol 10 North-Holland 9ndash54

APPENDIX

Proof of Theorem 31 To show D perp U | X we establish that f (u | d x) = f (u | x)for all (ud x) We have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

= f (u x)

f (x)= f (u | x)

where the first equality in the second line holds since A2(i) (ie Dg | Xg sim D | X)implies fg(d x) = f (d | x) fg(x) say and the second holds by A2(i i) fg(u | d x) =fg(u | x) n

Proof of Theorem 32 Note that

f (u | x) = f (u x)

f (x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

We also have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | d x) fg(x)sum

gisinG pg fg(x)

The first equality in the second line follows from A2(i) Then f (u | d x) = f (u | x) forall (ud x) if and only ifsumgisinG

pg fg(u | d x) fg(x) =sumgisinG

pg fg(u | x) fg(x) for all (ud x)

By assumption (Ug | Dg Xg) g isinG is regular It follows immediately that for all g isinGDg perp Ug | Xg n

Proof of Proposition 33 Take any g isin G and let j belong to group Jg Dg rArrSYg means that Yj = r(Zj Uj bj ) Thus Uj perp Dj | X j implies that Yj perp Dj | X j byDawid (1979 Lemmas 41 and 42) This holds for all g isin G Applying Theorem 31gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

290 XUN LU ET AL

Proof of Theorem 34 We prove this by showing that Y perp D | X implies that Dg rArrSYg for all g isinG Given the assumed regularity Theorem 32 ensures that Y perp D | X impliesthat Yg perp Dg | Xg for all g isin G

Given (i) eq(33) and A2(ii) imply that for all j in group JgE

(Yj |X j = x Dj = d

) = r1(d z bj

)+ E[r2

(zUj bj

) | Dj = d X j = x]

= r1(d z bj

)+ E[r2

(zUj bj

) | X j = x]

Clearly r1(d z bj

)is constant in d if and only if E

(Yj |Dj = d X j = x

)is constant in

d Yj perp Dj | X j implies that E(Yj |Dj = d X j = x

)is constant in d which then implies

that r1(d z bj

)is constant in d Thus Dj rArrS Yj so Dg rArrS Yg

Given (i i) let ky = 1 without loss of generality Then by A2(ii) for all j in group JgFUj (u | x) = E

[1Uj le u

∣∣ X j = x]

= E[

1r(d zUj bj

) le r(d zu bj

)∣∣ X j = x]

= E[

1r(Dj Zj Uj bj

) le r(d zu bj

)∣∣ Dj = d X j = x]

= E[

1Yj le r

(d zu bj

)∣∣ Dj = d X j = x]

= Fj(r(d zu bj

) | d x)

By strict monotonicity of Fj (y | d x) in y

r(d zu bj

) = Fminus1j

(FUj (u | x) | d x

)

Now Yj perp Dj | X j implies that Fminus1j

(FUj (u | x) | d x

)is constant in d so r

(d zu bj

)is

also a constant function in d ie Dj rArrS Yj so Dg rArrS Yg As either (i) or (i i) holds for each g isin G we have Dg rArrS Yg for all g isin G n

Proof of Theorem 35 To show that A1 - A2 and the assumed regularity for Ygtogether with Dg rArrS(Xg) Yg for some g isin G imply D perp Y | X we let j in group Jgg isin G be such that Dj rArrS(X j ) Yj A2(i i) ensures that Dj perp Uj | X j so

P[Yj le y | Dj = d X j = x] =int

1r(d zu bj ) le y

d Fj (u | d x)

=int

1r(d zu bj ) le y

d Fj (u | x)

By assumption Dj rArrS(X j ) Yj so there exists ylowast isin supp(Yj ) such that there is no mea-

surable mapping f jylowast for whichint

1r(d zu bj ) le ylowastd Fj (u | d x) = f jylowast(x) ae-x Specifically this rules out the possibility thatint

1r(d zu bj ) le ylowast

d Fj (u | X j ) = f jylowast(X j ) equiv P[Yj le ylowast | X j ] as

Thus there exists ylowast isin supp(Yj ) such that

P[Yj le ylowast | Dj = d X j = x

] = P[Yj le ylowast | X j = x

]

for x in a set of positive probability so Yj perp Dj | X j Given the regularity assumed for YgTheorem 32 (for Yg) gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 291

Proof of Theorem 41 The proof is analogous to that of Theorem 31 n

Proof of Proposition 42 The proof is analogous to that of Proposition 33 n

Proof of Proposition 43 Under B3 Dgt rArrS Ygt and Dgtminus1 rArrS Ygtminus1 forallg isinGt cupGtminus1 imply that for j isin Jg there exist two measurable functions r1 and r2 such that

Yjt = r1(Uj0 bj0)+ r2(Y jtminus1 Zjt Ujt bjt

) j = 1 N t = 1

Hence

Yjt minusYjtminus1 = r2(Y jtminus1 Zjt Ujt bjt

)minus r2(Y jtminus2 Zjtminus1Ujtminus1 bjtminus1

)

At the sample level this means that

Yt minusYtminus1 = r2(Y tminus1 Zt Ut Bt

)minus r2(Y tminus2 Ztminus1Utminus1 Btminus1

)

Then B4 implies that

Dt perp (Xt Y tminus1Ut Utminus1 Bt Btminus1) | Xt Y tminus1

which further implies that Dt perp (Yt minusYtminus1) | Xt Y tminus1by Dawid (1979 Lemmas 41 and42) as (Yt minus Ytminus1) is a function of (Xt Y tminus1Ut Utminus1 Bt Btminus1) Then we have Dt perpYt | Xt Y tminus1 as Ytminus1 is a component of Y tminus1 n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

Page 9: 33 GRANGER CAUSALITY AND STRUCTURAL CAUSALITY IN … Lu_Su... · 2017. 2. 7. · 264 XUNLUETAL. natural experiments. The goal of this paper is to establish the analogous equiv-alence

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 271

then Dj does not structurally cause Yj and we write Dj rArrS Yj Otherwise Dj

structurally causes Yj and we write Dj rArrS Yj

Here we implicitly assume that there is variation in potential cause Dj (ie Dj

is nondegenerated) and under the set of counterfactual policies Dj the structuralfunction r is invariant Therefore the invariant function r fully characterizes thecausalstructural relationship between Yj and (Dj Zj Uj )

Structural causality is structural functional dependence in a specific contextSimilar definitions can be given for Zj and Uj but we leave these implicitas these are not the main causes of interest Causality for components of Dj

is defined in an obvious way If we define Yj (d) = r(d Zj Uj bj

)as the

potential outcomes (see eg Rubin 1974 Rosenbaum and Rubin 1983 andHolland 1986) then the equivalent definition of structural noncausality is thatYj (d) = Yj (dlowast) as for all (ddlowast) isin Dj

Structural causality can be easily understood in the familiar linear structure forscalar Yj

Yj = bj0 + Dprimej bj1 + Z prime

j bj2 +Uj

where bj = (bj0bprimej1bprime

j2)prime If bj1 = 0 then Dj does not structurally cause Yj

Otherwise it does This is so natural and intuitive that one might wonder whycausality is such a thorny topic One main reason for confusion arises from therelation between causality and simultaneity as discussed above the triangularsystems considered here obviate this issue Another main reason for confusionis the failure to distinguish carefully between structural equations of the sort writ-ten above and regression equations which may look similar but need not havestructural content The equation above is entirely structural We will encounterregressions (conditional expectations) only after suitable structural foundationsare in place

Observe that heterogeneity in bj rsquos permits Dj to cause Yj for some j and notfor others

The nondegeneracy of Dj ensures that Dj contains at least two points so Dj

is a variable rather than a constant Variation in potential causes is fundamentalfor defining causality (cf Holland 1986) this is what makes possible analogousdefinitions of causality for Zj and Uj Significantly however because bj rsquos arefixed constants they cannot be causes of Yj Instead bj rsquos can be effects or candetermine effects This follows from the following formal definition

DEFINITION 32 (Intervention and Effect) Let j be given and let d and dlowastbe distinct admissible values for Dj Then d rarr dlowast is an intervention to Dj Let(zu) be admissible values for (Zj Uj ) The difference ylowast

j minus yj = r(dlowast zu bj

)minusr(d zu bj

)is the effect on Yj of the intervention d rarr dlowast to Dj at (zu)

This is also referred to as the ldquocausal effectrdquo in the treatment effect literature(see eg Rubin 1974) For the linear case with scalar d the effect of a one-unit

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

272 XUN LU ET AL

intervention d rarr d +1 is

ylowastj minus yj = (

bj0 + (d +1)bj1 + zprimebj2 +u)minus (

bj0 +d bj1 + zprimebj2 +u) = bj1

For unit j this effect is fixed ie it is the same constant for all (zu) Becausebj can differ across units this permits effect heterogeneity

Alternatively if Yj = bj0 + Zj Dprimej bj1 +Uj with scalar Zj then the effect of

d rarr dlowast is

ylowastj minus yj = (

bj0 + zdlowastprimebj1 +u)minus (

bj0 + zd primebj1 +u) = z

(dlowast minusd

)primebj1

Here the effect of Dj on Yj depends on z but not u In this case bj1 determinesthe effect of Dj on Yj together with Zj When an effect depends on a variable(ie an element of Zj or Uj ) it is standard (especially in the epidemiologicalliterature) to call that variable an effect modifier

For simplicity we call ylowastj minus yj the effect of d rarr dlowast Chalak and White (2012)

discuss indirect direct and total effects in structural equations In the potentialoutcome framework Rubin (2004) discusses ldquodirectrdquo and ldquoindirectrdquo casual effectusing principal stratification

We distinguish the structural causal effect from various average causal effectssuch as E[r(dlowast Zj Uj bj ) minus r(d Zj Uj bj )] It is clear that structural non-causality implies zero average causal effects while the converse is not necessarilytrue

To describe identified effects in samples from heterogeneous populationswe define population groups Jg g = 1 as collections of populationunits having identical bj and identical distributions of (Dj Uj ) | X j whereX j = (Z prime

j W primej )

prime We define G = 1 As the population is finite so is thenumber of groups le N We define Ng = j isinJg where middot is the cardinal-ity of the indicated set and let pg = NgN be the proportion of population unitsbelonging to group Jg The bj rsquos by themselves need not define groups as thedistributions of (Dj Uj ) | X j may differ for units with identical bj We call cross-group variation in the distributions of (Dj Uj ) | X j distributional heterogeneityto distinguish this from structural heterogeneity that refers to cross-group varia-tion in bj

Note that because groups are defined by unobservable constants bj and dis-tributions involving the unobservable Uj in general we do not know for surewhether two units belong to the same group Interestingly in the related litera-ture Bester and Hansen (2016) consider estimating grouped effects in panel datamodels when each individualrsquos group identity is known and Su Shi and Phillips(2014) consider identifying latent group structures in panel data models via a vari-ant of Lasso Nevertheless both groups of researchers have only focused on thestructural heterogeneity in the panel framework

Given A1 it follows that (Yj Dj ) | X j is identically distributed for all unitsj in group Jg As a convenient shorthand for j isin Jg we write (Yj Dj ) | X j sim(Yg Dg) | Xg where A sim B denotes that A is distributed as B

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 273

Typically we do not observe an entire population Instead we observe observa-tions sampled from the population in some way For simplicity and concretenesswe consider simple random sampling At the sample level we write4

Y = r (D Z U B)

where B is the randomly sampled bj Note that the randomness of B arises solelyfrom the sampling process We write X = (Z primeW prime)prime and distinguish X from thecauses D by calling X a vector of covariates

For any random vector X we let f (x) and F (x) denote the joint prob-ability density function (PDF) and cumulative distribution function (CDF)respectively For any two random vectors X and Y we use f (y|x) and F (y|x)to denote the conditional PDF and CDF of Y given X = x respectively Wealso use subscript j and g to denote individual j and group g respectivelyFor example Fg(ud x) and fg (u|d x) denotes the CDF of (Ug Dg Xg)and the conditional PDF of Ug given Dg Xg for group g respectively Notethat Fg(ud x) = Nminus1

g

sumjisinJg Fj (ud x) which defines the mixture CDF of

(Ug Dg Xg)

32 Gminuscausality a first encounterWe now define Gminuscausality for cross-section data based on Grangerrsquos philoso-phy of nonpredictability Holland (1986 p 957) considers a special case whereD is randomized and applies Gminuscausality to cross-section data5 As in Hollandwe define Gminusnon-causality as a conditional independent statement FollowingDawid (1979) we write X perp Y | Z to denote that X and Y are independentgiven Z and X perp Y | Z if X and Y are not independent given Z Translat-ing Granger and Newboldrsquos (1986 p 221) definition to the cross-section contextgives

DEFINITION 33 (GminusCausality) Let Y Q and S be random vectors IfQ perp Y | S then Q does not Gminuscause Y with respect to (wrt) S OtherwiseQ Gminuscauses Y wrt S

Here Y Q and S can be any random vectors and this notion has no structuralcontent When S is a constant we have the simplest form of Gminusnon-causalityindependence Correlation is an example of Gminuscausality

To give Gminuscausality structural meaning we impose A1 assume that D obeysa suitable exogeneity condition and take Q = D and S = X The exogeneityassumed for D ensures identification for various measures of its effects and canoften be structurally justified It also turns out to be ideally suited to endowingGminuscausality with structural meaning

To see how this works in a simple setting consider the homogeneous case withD being a binary scalar in which average treatment effects (AT E) and averagetreatment effects on the treated (AT T ) are often discussed (see eg Rubin 1974

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

274 XUN LU ET AL

Hahn 1998 Hirano Imbens and Ridder 2003 and Angrist and Pischke 2009)Define the covariate-conditioned effect of treatment on the treated as

AT TX = E (Y (1)minusY (0) | D = 1 X)

where Y (1) is the potential response to treatment and Y (0) is the potentialresponse in the absence of treatment Using A1 we write Y (1) = r(1 Z U b0)and Y (0) = r(0 Z U b0) AT TX thus has a clear structural interpretation Fromthis we can construct AT T

AT T = E (Y (1)minusY (0) | D = 1) = E(AT TX | D = 1)

Note that here we define AT T and AT TX based on our structural equation r Nevertheless they can be defined without a structural model ie based on thetwo potential outcomes Y (0) and Y (1) (see eg Rubin 1974 and Angrist andPischke 2009)

To identify AT TX it suffices that D perp Y (0) | X Given A1 it suffices forthis that D perp U | X as is readily verified This conditional exogeneity is acommon identifying assumption in cross-sections6 Classical strict exogeneity((D Z) perp U ) is sufficient but not necessary for this Using D perp U | X in thesecond line below we have

AT TX = E [r(1 Z U b0) | D = 1 X ]minus E [r(0 Z U b0) | D = 1 X ]

= E [r(1 Z U b0) | D = 1 X ]minus E [r(0 Z U b0) | D = 0 X ]

= E(Y | D = 1 X)minus E(Y | D = 0 X) = μ(1 X)minusμ(0 X)

Thus AT TX can be expressed in terms of the distribution of observables (hereμ(1 X) and μ(0 X)) so this effect is identified (eg Hurwicz 1950) Simi-larly AT T is identified The identification result based on conditional exogeneityhas been extensively discussed in the treatment effect literature (see eg Rubin1974 Holland 1986 and Angrist and Pischke 2009)

To see the structural meaning for Gminuscausality observe that E(Y | D = 1 X)minusE(Y | D = 0 X) = 0 as is another way to write E(Y | D X) = E(Y | X) asThus in this context Gminusnon-causality in mean of D for Y wrt X is equivalentto AT TX = 0 as

Note that even for binary treatments Gminuscausality in mean does not tell thefull story as Gminuscausality in distribution can hold even when AT TX or AT Tvanishes For example suppose

Y = U D

with D perp U E(U ) = 0 E(U 2) lt infin and E(D2) lt infin Then D structurallycauses Y its effect on Y is U a random effect These effects may be posi-tive or negative however they average out as AT T = 07 Here and as we willshow generally Gminusnon-causality essentially has the interpretation that Y is not

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 275

structurally caused by D under the key conditional exogeneity assumption Inthis example we have Gminuscausality (D perp Y ) as E(Y 2 | D) = D2 E(U 2) TestingGminusnon-causality will detect the structural causality here in contrast to testingAT T = 0 or equivalently Gminusnon-causality in mean of D for Y

When D is nonbinary A1 together with suitable exogeneity conditionssimilarly suffices to identify certain effects that give structural meaning toGminuscausality This makes it generally possible to test for structural causality bytesting for Gminuscausality

With heterogeneity matters become somewhat more complicated In particularin the binary case μ(1 X)minusμ(0 X) is no longer AT TX Instead with suitableexogeneity (Assumption A2 below) we have that for each x μ(1 x)minusμ(0 x)recovers a blended effect

μ(1 x)minusμ(0 x) =sumg=1

ζg|x AT Tg|x

where the ζg|x rsquos are nonnegative weights adding to one and for all j isin JgAT Tg|x = E[Yj (1) minus Yj (0) | Dj = 1 X j = x] is the group-specific covariate-conditioned average effect of treatment on the treated Here Yj (1) is the poten-tial response to treatment and Yj (0) is the potential response in the absence oftreatment

33 Exogeneity and effect identification with heterogeneity

With heterogeneity an identifying exogeneity condition relevant for analyzingGminuscausality is

Assumption A2 (Heterogeneous Cross-Section Exogeneity) For all g isin G(i) Dg | Xg sim D | X and (i i) Dg perp Ug | Xg

Observe that this imposes structures for all groups namely all g isin G Althoughthis condition is not the weakest possible (see Chalak and White (2012) and thediscussion below) A2 generally suffices to identify effects and link Gminuscausalityand structural causality A2(i) restricts the allowed distributional heterogeneityfor all g isin G the conditional distributions of Dg given Xg are assumed identicalA2(i i) imposes conditional exogeneity for all groups Below we discuss thisfurther

THEOREM 31 Suppose A1 - A2(i) hold If A2(i i) holds then D perp U | X

It is also of interest to know whether the converse holds A strict conversedoes not hold as certain fortuitous cancellations in population-group condi-tional distributions can yield sample conditional exogeneity without population-group conditional exogeneity Nevertheless the converse does hold under a mildregularity condition ruling out exceptional cases We introduce the followingdefinition

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

276 XUN LU ET AL

DEFINITION 34 Ug | Dg Xg g isin G is regular ifsumgisinG

pg fg(u | d x) fg(x) =sumgisinG

pg fg(u | x) fg(x) for all (ud x) (32)

implies Ug perp Dg | Xg for all g isin GWhen G = 1 Ug | Dg Xg g isin G is necessarily regular To understand whatregularity rules out consider the next simplest case with G = 2 and Xg absentThen Ug | Dg g isin G fails to be regular if and only if U2 perp D2 (say) and

f1(u | d) = f1(u)minus 1minus p1

p1[ f2(u | d)minus f2(u)] for all (ud)

where the two groups have been re-indexed for convenience Suppose that p1f1(u) f2(u) and f2(u | d) are arbitrary Then it is easily arranged that f1(u | d)can be negative for (ud) in a set of positive probability so f1(u | d) does notdefine a conditional density and Ug | Dg g isin G is regular after all If this f1(u |d) is nevertheless a conditional density it is clearly highly special as it ensuresthat eq(32) holds for the given p1 and the functions defined by f1(u) f2(u) andf2(u | d) (all of which are typically unknown) but not necessarily otherwise

In the general case Ug | Dg Xg g isin G fails to be regular if and only ifUg perp Dg | Xg for some g isin G and

f1(u | d x) = f1(u | x)minus (p1 f1(x))minus1sumg=2

pg [ fg(u | d x) minus fg(u | x)] fg(x)

times for all (ud x)

where we again re-index the groups As before this is clearly a very special pop-ulation configuration ruling out such cases by imposing regularity is a weak re-striction

The converse result is

THEOREM 32 Suppose A1 - A2(i) hold and suppose Ug | Dg Xgg isin Gis regular If D perp U | X then A2(i i) holds

Theorems 31-32 also hold with Yg | Dg Xg g isinG regular D perp U | X replacedby D perp Y | X and A2(i i) replaced by Dg perp Yg | Xg g isin G This result plays acentral role in linking structural causality and Gminuscausality

34 Linking structural causality and GminuscausalityAs we have seen A1 imposes structures where causal effects are well definedA2 permits recovering versions of these As we now prove this gives structuralmeaning to Gminuscausality generally Recall that the bj rsquos are identical in group Jgthus the same structural causality relations hold for all j in a given group Wewrite Dg rArrS Yg when Dj rArrS Yj for (all) j in group Jg Our next result showsthat structural noncausality implies Gminusnon-causality

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 277

PROPOSITION 33 Let A1 - A2 hold Then structural noncausality(Dg rArrS Yg forallg isinG) implies that D does not Gminuscause Y wrt X ie D perp Y | X

This result is intuitive when structural effects are well defined and identifiableGminusnon-causality follows from structural noncausality Holland (1986 p 958)gives a similar result for the random experiment case Our Proposition 33 jus-tifies using tests of Gminusnoncausality to test structural noncausality if we rejectGminusnoncausality we must reject structural noncausality

As WL show structural causality does not imply Gminuscausality Without furtherassumptions the concepts are not equivalent Nevertheless if as is commonlyassumed in the literature (see WL for discussion) r obeys separability betweenobservable causes D and unobservable causes U or r obeys a specific form ofmonotonicity in U then with suitable regularity structural causality does implyGminuscausality We have

THEOREM 34 Given A1 - A2 suppose Yg | Dg Xgg isin G is regularSuppose further that for all g isin G either (i) or (i i) holds

(i) For all j isin Jg for unknown measurable functions r1 and r2

Yj = r1(Dj Zj bj

)+ r2(Zj Uj bj

) (33)

(ii) For all j isinJg for = 1 ky r(d zu bj ) = r0(d zu bj ) for scalaru where r0(d z middot bj ) is strictly monotone increasing for each admissi-ble (d z) and Fj(y | d x) = P[Yj le y | Dj = d X j = x] is strictlymonotone in y for each admissible (d x)

Then structural causality (for some g isin G Dj rArrS Yj for (all) j in Jg) impliesthat D Gminuscauses Y wrt X ie D perp Y | X

Kasy (2011) gives a discussion of structures satisfying (i i) Even without sep-arability or monotonicity there is an equivalence between Gminuscausality and astronger notion of structural causality that handles certain exceptional cases wherethe causal structure and the conditional distribution of (Uj Dj ) given X j inter-act in just the right way to hide the structural causality WL call this strongernotion structural causality with positive probability and provide discussions Welet Yj = supp(Yj ) and Xj = supp(X j )

DEFINITION 35 (Structural Causality with Positive Probability) SupposeA1 holds and let j be given Suppose that for each y isin Yj there exists a mea-surable function f jy Xj rarr [01] such thatint

1r(Dj Zj u bj

)lt y

d Fj

(u | X j

) = f jy(X j

)as (34)

where Fj (u | X j ) = E[1Uj le u | X j ] Then Dj does not structurally cause Yj

almost surely (as) wrt X j (Dj rArrS(X j ) Yj ) Otherwise Dj structurally causesYj with positive probability (wpp) wrt X j

(Dj rArrS(X j ) Yj

)

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

278 XUN LU ET AL

By the definition of a group the same structural causality wpp relations holdfor all j in a given group We thus write Dg rArrS(Xg) Yg when Dj rArrS(X j ) Yj holdsfor all j in group Jg

Below is an example in WL in which Dj structurally causes Yj while Dj doesnot structurally cause Yj as Consider the structural equation r

Yj = r(Dj Uj

) = DjradicD2

j +1Uj1 + 1radic

D2j +1

Uj2

where Uj equiv (Uj1Uj2) and Dj are all N (01) random variables and Dj U1 j andU2 j are mutually independent For simplicity there is no bj Zj or X j It is clearthat Dj structurally causes Yj here Nevertheless Dj does not structurally causeYj as To see this note that Yj and Dj are independent Thus the LHS of eq(34)becomesint

1r(Dj u1u2

)lt y

d Fj (u1u2)

=int

1r(Dj u1u2

)lt y

d Fj

(u1u2|Dj

) = Pr[Yj lt y|Dj

] = (y)

where Fj (middot) and Fj(middot|Dj

)denote the CDF of Uj and the conditional CDF of Uj

given Dj and is the standard normal CDF Thus the RHS does not depend onDj ie Dj does not structurally cause Yj as

Theorem 35 below gives a result linking Gminuscausality and structural causalityas

THEOREM 35 Suppose A1 - A2 hold Suppose also that Yg | Dg Xgg isinG is regular If Dj structurally causes Yj wpp wrt X j (for some g isin GDj rArrS(X j ) Yj for all j in Jg) then D Gminuscauses Y wrt X ie D perp Y | X

Together Proposition 33 and Theorem 35 establish that cross-sectionGminuscausality and structural causality are essentially equivalent under the key con-ditional exogeneity assumption We say ldquoessentiallyrdquo as Theorem 35 rules outthe two ways that Gminusnon-causality can mask structural causality The first arisesfrom subtle interactions between the causal structure and the conditional distribu-tion of (Uj Dj ) given X j The second arises under heterogeneity from delicatecancellations among the conditional distributions of Yj given Dj and X j acrossgroups These exceptional possibilities only trivially mitigate the power of testsfor Gminuscausality as tests for structural causality Errors of interpretation and in-ference need not result as long as these exceptions are recognized

35 Identification and identification failure

To gain further insight into the relation between Gminuscausality and effect identi-fication we now undertake a deeper analysis of Gminuscausality in mean For this

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 279

let qg(d x) = fg(d x)sumhisinG (ph fh(d x)) Under A1 the law of iterated ex-

pectations gives

E (Y | D = d X = x) =sumgisinG

pg qg(d x)

intr(d zu bg) d Fg(u | d x)

For concreteness consider identifying average marginal effects for a continu-ous treatment d To allow both discrete and continuous Ug let νg(u | d x) definea σminusfinite measure dominating that defined by Fg(u | d x) and let fg(u | d x)define the associated Radon-Nikodym density for Ug given (Dg Xg) Assum-ing differentiability and mild regularity derivations parallel to those of White andChalak (2013) give

part

partdE(Y | D = d X = x)

=sumgisinG

pg qg(d x)

intpart

partdr(d zu bg) fg(u | d x) dνg(u | d x)

+sumgisinG

pg qg(d x)

int[r(d zu bg)

part

partdln fg(u | d x)]

times fg(u | d x) dνg(u | d x)

+sumgisinG

pgpart

partdqg(d x)

intr(d zu bg) fg(u | d x)dνg(u | d x)

After some manipulation this regression derivative can be written as

part

partdE (Y | D = d X = x) = β(d x)+ δ1(d x)+ δ2(d x) (35)

The first of the three terms on the right is a weighted average marginal effect

β(d x) =sumgisinG

pg qg(d x)βg(d x) where βg(d x)

= E[(partpartd)r(Dg ZgUg bg) | Dg = d Xg = x

]is the covariate-conditioned average marginal effect of Dj on Yj at (Dj = d X j =x) for j isin Jg g isin G Thus when the other two terms vanish the regressionderivative identifies the structurally meaningful weighted effect β(d x) andwhen β(d x) is nonzero for (d x) in a set of positive probability we haveGminuscausality in mean of D for Y wrt X When the other two terms do notvanish we generally have identification failure and there is no necessary linkbetween Gminuscausality and structural causality

The second term is a pure endogeneity bias

δ1(d x) =sumgisinG

pg qg(d x) E(εg Sg | Dg = d Xg = x

)

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

280 XUN LU ET AL

where εg = Ygminus E(Yg | Dg = d Xg = x

)is the regression residual for the given

group and Sg = (partpartd) ln fg(Ug | Dg Xg) is the exogeneity score of White andChalak (2013) Observe that E(εg Sg | Dg = d Xg = x) represents a specificform of omitted variable bias where the exogeneity score is the omitted variableA sufficient (but not necessary) condition for this bias to vanish is A2(i i) Dg perpUg | Xg for all g isin G

The third term is a pure heterogeneity bias

δ2(d x) =sumgisinG

pgpart

partdqg(d x) E

(Yg | Dg = d Xg = x

)

due to heterogeneity of fg(d | x) across members of G8 A sufficient (but notnecessary) condition for this to vanish is A2(i) Dg | Xg sim D | X for all g isin G

Thus A2 ensures that the regression derivative identifies the weighted effect

β(d x) = βlowast(d x) =sumgisinG

pg qlowastg(x)βlowast

g(d x)

where qlowastg(x) = fg(x)

sumh (ph fh(x)) and βlowast

g(d x) = int(partpartd)r(d zu bg)

d Fg(u | x) This identification provides the link between Gminuscausality in meanand structural causality

Although A2 is not necessary for effect identification cases where identifica-tion holds in the absence of A2 are quite special analogous to failures of reg-ularity Thus for practical purposes A2 can be viewed as playing the key rolein identifying effects of interest and thereby linking Gminuscausality and structuralcausality

Further as Proposition 61 and Corollary 62 of WL show in the absence ofstructural causality Gminuscausality is essentially equivalent to exogeneity failureHere this is reflected in the fact that when βg(d x) vanishes for all g A1 andmild regularity conditions give

part

partdE (Y | D = d X = x) = δ1(d x)+ δ2(d x)

Thus with structural noncausality as Gminuscausality in mean implies the fail-ure of A2(i) A2(i i) or both conversely failure of A2 essentially ensuresGminuscausality in mean

36 Testing for Gminuscausality and structural causality in cross-sectiondata

As just seen the hypothesis of Gminusnon-causality and thus of structural noncausal-ity under the conditional exogeneity assumption is a specific conditional inde-pendence In the literature there are many conditional independence tests thatapply to IID data (eg see Delgado and Gonzalez-Manteiga 2001 Fernandes and

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 281

Flores 2001 Su and White 2007 2008 2014 Song 2009 Linton and Gozalo2014 Huang Sun and White 2016) These methods can be computationallychallenging as they are nonparametric Angrist and Kuersteiner (2004 2011)develop a semi-parametric test for conditional independence Based on the factthat Y perp D | X implies ψ1(Y ) perp ψ2(D) | X for any measurable vector functions(ψ1ψ2) WL propose several convenient regression-based tests

4 STRUCTURAL CAUSALITY AND GminusCAUSALITY IN PANEL DATAThe literature contains considerable discussion about Gminuscausality in the paneldata setting Nevertheless the focus is mainly on linear or parametric modelsFor example Chamberlain (1984) discusses Gminuscausality ldquoconditional on un-observables [lsquofixed effectrsquo]rdquo in a linear model and a logit model Holtz-EakinNewey and Rosen (1988) consider a traditional linear vector autoregression(VAR) and use GMM to test Gminuscausality Nair-Reichert and Weinhold (2001)discuss Gminuscausality in a dynamic mixed fixed- and random-coefficients linearmodel allowing heterogeneity across individuals Dumitrescu and Hurlin (2012)test for Gminuscausality in a linear dynamic panel model with fixed coefficients thatvary across individuals There is also a growing literature on nonlinear and non-separable panel models (see eg Hoderlein and White (2012) and the referencestherein) We are not aware that Gminuscausality has been discussed in such modelsso far

We emphasize that in panel data Gminuscausality is also entirely based on pre-dictability and has no causal interpretation In this section we link Gminuscausalityand structural causality in general panel structures Since the static panel case issimilar to the cross-sectional case we focus on a dynamic data generating processTo simplify notation for given y isin N+ we let Y jtminus1 denote the finite historyY jtminus1 = (Yjtminusy Yjtminus1)

Assumption B1 (Panel Population DGP) Let (F P) be a complete prob-ability space let N isin N+ and let y isin N+ For the members of a populationj isin 1 N let the random vectors Yjt t = 12 be structurally determinedby a triangular system as

Yjt = r(Y jtminus1 Djt Zjt Ujt bjt

) (41)

where r is an unknown measurable ky times 1 function ky isin N+ Yjτ (τ = 1 minusy 0) Djt (kd times1kd isin N+) Zjt (kz times1 kz isin N) and Ujt (ku times1 ku isin N)are vectors of nondegenerate random variables on (F P) and bjt is a non-random real vector of dimension kb isin N Suppose also that Wjt t = 12 arerandom vectors on (F P) with dimension kw isin N The triangular structureis such that for t = 12 neither Yjsinfins=1minusy

nor Djsinfins=1 structurally deter-mine Wjt Yjsinfins=1minusy

does not structurally determine Djt Zjt or Ujt andDjsinfins=1 does not structurally determine Zjt or Ujt

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

282 XUN LU ET AL

The interpretation of structural eq(41) is the same as that for eq(31) ex-cept that here we have multiple time periods t = 12 The elements of(Yjt Djt Zjt Wjt Ujt

)and bjt rsquos can contain both time-varying and time-

invariant elements For now we do not distinguish these The elements of(Djt Zjt Wjt Ujt ) can contain lags of underlying variables By permittingonly finite histories Y jtminus1 Djt Zjt we restrict attention to Markov-type datagenerating processes this greatly simplifies the analysis and corresponds to thestructures usually considered in practice

As in the cross-section case we can have both structural heterogeneity anddistributional heterogeneity We specify the relevant structural and distributionalheterogeneities below

We again impose random sampling from the population For simplicity weassume all time periods can be observed for every individual so we observe abalanced panel Similar to the cross-section case we write

Yt = r (Y tminus1 Dt Zt Ut Bt ) t = 1 T

by suppressing the cross-sectional subscript i We assume that we observe data(Yt Dt Zt Wt ) t = 1 T and relevant lags prior to t = 1

41 Linking structural causality and Gminuscausality in dynamic panelsWe first define structural causality for the dynamic panel structure Let Djt be thesupport of Djt

DEFINITION 41 (Structural Causality in Dynamic Panels) Let j and t begiven If the function r(y jtminus1 middot zjt ujt bjt ) Djt rarr R

ky is constant on Djt

for all admissible y jtminus1 zjt and ujt then Djt does not structurally cause Yjt and we write Djt rArrS Yjt Otherwise Djt structurally causes Yjt and we writeDjt rArrS Yjt

We let X jt = (Z primejt W prime

jt )prime denote the covariates at time t WL consider

retrospective conditional exogeneity and retrospective Gminuscausality (White andKennedy 2009) We also allow this For this we letX jt denote a covariate his-tory that may contain X jt and lags or leads of X jt as in Wooldridge (2005 p 41)or WL ie for some m isin N+

X jt = SX(X jtminusm X jt X jt+m

) t = 12

where SX is a given selection matrix We assume

X jt X jtminusm subeX jt and

allow X jt also to contain leads of X jt When Xjt contains leads we have theretrospective case as in WL We denote randomly sampled covariates X t Forsimplicity we also assume that all needed elements of X t are observable Tem-poral precedence plays an important role in panel data In particular comparedwith the cross-sectional case we now have a richer set of covariates (includingboth leads and lags) to choose from

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 283

To provide a link between structural causality and Gminuscausality for dynamicpanels we condition on not only covariates X t but also lags of Yt For thiswe denote Y jtminus1 = (Yjtminusmy Yjtminus1) where my isin N+ and my ge y ie

Y jtminus1 sube Y jtminus1 Similarly we let Y tminus1 be the randomly sampled Y jtminus1Similar to the cross-section case for each t we define population group Jgt

g isin 1 t as a collection of population units having identical bjt and iden-tical distribution of (Ujt Djt ) |X jt Y jtminus1 We also let Gt = 1 t Forgiven t we write Dgt rArrS Ygt to denote Djt rArrS Yjt for all j in group Jgt g isin Gt

Next we impose an exogeneity assumption analogous to A2

Assumption B2 (Heterogeneous Dynamic Panel Exogeneity) Let t be givenFor all g isin Gt (i) Dgt |Xgt Y gtminus1 sim Dt |X t Y tminus1 and (i i) Dgt perp Ugt |Xgt Y gtminus1

Here as before for all g isin Gt the conditional distributions of Dgt given(Xgt Y gtminus1) are assumed to be identical and we write the common distributionas Dt |X t Y tminus1

The analog of Theorem 31 relating population and sample conditional exo-geneity is

THEOREM 41 Suppose B1ndashB2(i) hold and let t be given If B2(i i) holdsthen Dt perp Ut |X t Y tminus1

Proposition 42 provides a link between structural noncausality and Gminusnon-causality

PROPOSITION 42 Suppose that B1 and B2 hold Let t be given Thenstructural noncausality (Dgt rArrS Ygt forallg isin Gt ) implies that Dt does notGminuscause Yt wrtX t Y tminus1 ie Dt perp Yt |X t Y tminus1

We can also define structural causality with positive probability for dynamicstructures and show that structural causality and Gminuscausality are ldquoessentiallyrdquoequivalent under the key conditional exogeneity assumption For brevity we omitthe details

So far we have not carefully distinguished between time-varying and time-invariant elements of unobservable bjt and Ujt Below we briefly discuss thecase where we allow time-invariant components of Ujt and the randomly sampledbjt to be arbitrarily dependent or correlated with the cause of interest Djt For this we denote Ujt equiv (Ujt Uj0) where Ujt is time varying and Uj0time-invariant Similarly we denote bjt equiv (bjt bj0) The randomly sampledUjt Uj0 bjt bj0 are denoted as Ut U0 Bt B0 respectively In princi-ple we want to remove the time-invariant Uj0 and bj0 using eg the firstdifference For this it is convenient to impose a separable assumption on thestructural equation9 We also impose a conditional exogeneity assumption at thesample level

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

284 XUN LU ET AL

Assumption B3 Suppose that B1 holds with

Yjt = r1(Djt Uj0 bj0

)+r2(Y jtminus1 Djt Zjt Ujt bjt

) j = 1 N t = 1

where r1 and r2 are two unknown ky times1 measurable functions

Assumption B4 Suppose that for given t Dt perp (Ut Utminus1 Bt Btminus1

) |Xt Y tminus1

B4 is a sample-level assumption which can be supported by a correspond-ing assumption at the heterogeneous population level Note that in general B4is a strong assumption as we condition on Y tminus1 which is a function of bothDtminus1 and Utminus1 Nevertheless this assumption is plausible under the null of struc-tural noncausality as Y jtminus1 is a constant function of Djtminus1 under the nullOne simple example is that bjt rsquos are constants over j and t and Djt infint=1 perpYjt 0

t=1minusyUjt infint=1X jt infint=1 Under the null of structural noncausality for

all t Y jtminus1 is a function of Yjs0s=1minusy

Ujsts=1X jst

s=1 Then it is easyto show that in this case B4 is satisfied Certainly it will be interesting to relaxB4 but we leave this for future research

PROPOSITION 43 Suppose that B1ndashB4 hold Suppose that Dgt rArrS Ygtand Dgtminus1 rArrS Ygtminus1 forallg isin Gt cupGtminus1 Then Dt perp Yt | Xt Y tminus1

Proposition 43 suggests that when there are time-invariant components in theunobservables we can test for structural noncausality by testing for Gminusnon-causality Dt perp Yt | Xt Y tminus1 under the conditional exogeneity assumption B4

42 Testing for Gminuscausality and structural causality in panel dataAs shown above we can perform a Gminusnon causality test to test for structuralnoncausality Testing Gminusnon-causality in panel data is also simply a conditionalindependence test Here we focus on testing Dt perp Yt | Xt Y tminus1

First assume that the joint distributions of (Dt Yt Xt ) are identical over t In this case we can pool the data and implement a conditional independence testas discussed in Section 36

Second suppose that the joint distributions of (Dt Yt Xt ) are different overtime t In this case for each time period t using the cross-section data we canimplement a conditional independence test for Dt perp Yt | (Xt Y tminus1) and obtain atest statistic sayWt Thus we have T test statistics W1W2 WT and eachone can be used to test for structural noncausality for each t We may also want totest the hypothesis that for all t Djt does not structurally cause Yjt For this wecan construct our test statistic by taking the average of the T test statistics ieW equiv 1

T

sumTt=1Wt as in Im Pesaran and Shin (2003) and Dumitrescu and Hurlin

(2012) Other forms of ldquoaveragesrdquo are possible and special care is needed to takeinto account the dependence structure over time

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 285

5 CONCLUSION

This paper provides direct links between Granger causality and structural causal-ity in cross-section and panel data We extend Granger causality to cross-section and panel data and give a natural definition of structural causality inheterogeneous populations We show that under the key conditional exogeneityassumption Granger causality is essentially equivalent to structural causality incross-section and panel data Similar to the results in White and Lu (2010) fortime-series data our results here should enable researchers to avoid the misuse ofGranger causality and to establish the desired structural causal relation

NOTES

1 Strictly speaking if the conditional distributions are parameterized or treated as infinite-dimensional parameters then distributional heterogeneity may also be thought of as a special caseof structural heterogeneity

2 The classification is certainly overly simplistic We ignore the early literature on the philosophyof causality (for a review see eg Holland 1986 and Hoover 2008) We also ignore the litera-ture on causality discussed in other disciplines notably in machine learning (see eg Pearl 20092015 White and Chalak 2009 and White Chalak and Lu 2011) In particular Heckman and Pinto(2015) compare Haavelmorsquos structural framework of causality with the framework of Directed AcyclicGraphs (DAG) studied in the Bayesian network

3 Although Uj is typically called ldquounobservablerdquo it is better to view its elements as variables thatwill be omitted from empirical analysis This may be because they are unobservable it may also bebecause the researcher has purposefully or inadvertantly neglected them

4 For notational simplicity we suppress the observation subscript i5 Using our notation Holland considers the case where D perp (XU ) which implies our condi-

tional exogeneity assumption D perp U | X below6 See eg Altonji and Matzkin (2005) Hoderlein and Mammen (2007 2009) Imbens and Newey

(2009) White and Lu (2011) and White and Chalak (2013)7 AT T = E(Y (1)minusY (0) | D = 1) = E(U minus0 | D = 1) = E(U ) = 0

8 The conditional density fg(d | x) can also be interpreted as Imbensrsquos (2000) generalized propen-sity score

9 There is a recent literature which allows time-invariant unobservables to be arbitrarily correlatedwith causes of interest in the general nonseparable models (see eg Evdokimov 2010) We leave theproblem of linking G-causality and structural causality in this general setting for future research

REFERENCES

Altonji J amp R Matzkin (2005) Cross section and panel data estimators for nonseparable models withendogenous regressors Econometrica 73 1053ndash1102

Angrist J K Graddy amp G Imbens (2000) The interpretation of instrumental variables estimatorsin simultaneous equations models with an application to the demand for fish Review of EconomicStudies 67 499ndash527

Angrist J O Jorda amp G Kuersteiner (2013) Semiparametric estimates of monetary policy effectsstring theory revisited NBER Working Paper 19355

Angrist J amp G Kuersteiner (2004) Semiparametric causality tests using the policy propensity scoreNBER Working Paper No 10975

Angrist J amp G Kuersteiner (2011) Causal effects of monetary shocks semiparametric conditionalindependence tests with a multinomial propensity score Review of Economics and Statistics 93725ndash747

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

286 XUN LU ET AL

Angrist J amp J Pischke (2009) Mostly Harmless Econometrics An Empiricistrsquos Companion PrincetonUniversity Press

Bester CA amp CB Hansen (2016) Grouped effects estimators in fixed effects models Journal ofEconometrics 190 197ndash208

Bonhomme S amp E Manresa (2015) Grouped patterns of heterogeneity in panel data Econometrica83 1147ndash1184

Browning M amp JM Carro (2007) Heterogeneity and microeconometrics modelling In R BlundellW K Newey and T Persson (Eds) Advances in Economics and Econometrics Theory andApplications Ninth World Congress of the Econometric Society Volume 3 pp 45ndash74 CambridgeUniversity Press

Browning M amp JM Carro (2010) Heterogeneity in dynamic discrete choice models EconometricsJournal 13 1ndash39

Burda M M Harding amp J Hausman (2015) A Bayesian semi-parametric competing risk model withunobserved heterogeneity Journal of Applied Econometrics 30 353ndash376

Chalak K amp H White (2011) An extended class of instrumental variables for the estimation of causaleffects Canadian Journal of Economics 44 1ndash51

Chalak K amp H White (2012) Causality conditional independence and graphical separation insettable systems Neural Computation 24 1611ndash1668

Chamberlain G (1982) The general equivalence of Granger and Sims causality Econometrica 50569ndash581

Chamberlain G (1984) Panel data In Z Griliches amp M Intriligator (eds) Handbook of Economet-rics vol2 ch 22 pp1247ndash1318 Elsevier

Chesher A (2003) Identification in nonseparable models Econometrica 71 1405ndash1441Chesher A (2005) Nonparametric identification under discrete variation Econometrica 73

1525ndash1550Dawid AP (1979) Conditional independence in statistical theory Journal of the Royal Statistical

Society Series B 41 1ndash31Deb P amp PK Trivedi (2013) Finite mixture for panels with fixed effects Journal of Econometric

Methods 2 35ndash51Delgado M amp W Gonzalez-Manteiga (2001) Significance testing in nonparametric regression based

on the bootstrap Annals of Statistics 29 1469ndash1507Dumitrescu E-I amp C Hurlin (2012) Testing for Granger non-causality in heterogeneous panels

Economic Modelling 29 1450ndash1460Evdokimov K (2010) Identification and estimation of a nonparametric panel data model with unob-

served heterogeneity Working paper Department of Economics Princeton UniversityFlorens JP (2003) Some technical issues in defining causality Journal of Econometrics 112

127ndash128Florens JP amp D Fougere (1996) Non-causality in continuous time Econometrica 64 1195ndash1212Florens JP amp M Mouchart (1982) A note on non-causality Econometrica 50 583ndash591Granger CWJ (1969) Investigating causal relations by econometric models and cross-spectral

methods Econometrica 37 424ndash438Granger CWJ (1980) Testing for causality a personal viewpoint Journal of Economic Dynamics

and Control 2 329ndash352Granger CWJ (1988) Some recent developments in a concept of causality Journal of Econometrics

39 199ndash211Granger CWJ amp P Newbold (1986) Forecasting Economic Time Series Academic Press 2nd

editionHausman J amp T Woutersen (2014) Estimating a semi-parametric duration model without specifying

heterogeneity Journal of Econometrics 178 114ndash131Haavelmo T (1943) The statistical implications of a system of simultaneous equations Econometrica

11 1ndash12Haavelmo T (1944) The probability approach in econometrics Econometrica 12 (supp) iii-vi

1ndash115

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 287

Hahn J (1998) On the role of the propensity score in efficient semiparametric estimation of averagetreatment effect Econometrica 66 315ndash331

Hahn J amp R Moon (2010) Panel data models with finite number of multiple equilibria EconometricTheory 26 863ndash881

Heckman JJ (2000) Causal parameters and policy analysis in economics a twentieth century retro-spective Quarterly Journal of Economics 115 45ndash97

Heckman JJ (2008) Econometric causality International Statistical Review 76 1ndash27Heckman JJ amp R Pinto (2015) Causal analysis after Haavelmo Econometric Theory 31

115ndash151Hirano K G Imbens amp G Ridder (2003) Efficient estimation of average treatment effects using the

estimated propensity score Econometrica 71 1161ndash1189Hoderlein S amp E Mammen (2007) Identification of marginal effects in nonseparable models without

monotonicity Econometrica 75 1513ndash1519Hoderlein S amp E Mammen (2009) Identification and estimation of local average derivatives in non-

separable models without monotonicity Econometrics Journal 12 1ndash25Hoderlein S amp H White (2012) Nonparametric identification in nonseparable panel data models with

generalized fixed effects Journal of Econometrics 168 300ndash314Holland P (1986) Statistics and causal inference Journal of the American Statistical Association 81

945ndash970Holtz-Eakin D W Newey amp HS Rosen (1988) Estimating vector autoregressions with panel data

Econometrica 56 1371ndash1396Hoover K (2008) Causality in economics and econometrics In S Durlauf and L Blume (eds)

The New Palgrave Dictionary of Economics 2nd edition Palgrave MacmillanHsiao C (2014) Analysis of Panel Data Cambridge University PressHuang M Y Sun amp H White (2016) A flexible nonparametric test for conditional independence

Econometric Theory forthcomingHurwicz L (1950) Generalization of the concept of identification In TC Koopmans (eds) Statistical

Inference in Dynamic Economic Models pp 238ndash257 John WileyIm K H Pesaran amp Y Shin (2003) Testing for unit roots in heterogeneous panels Journal of Econo-

metrics 115 53ndash74Imbens G (2000) The role of the propensity score in estimating dose-response functions Biometrika

87 706ndash710Imbens G (2004) Nonparametric estimation of average treatment effects under exogeneity a review

Review of Economics and Statistics 86 4ndash29Imbens G amp W Newey (2009) Identification and estimation of triangular simultaneous equations

models without additivity Econometrica 77 1481ndash1512Imbens G amp J Wooldridge (2009) Recent developments in the econometrics of program evaluation

Journal of Economic Literature 47 5ndash86Kasy M (2011) Identification in triangular systems using control functions Econometric Theory 27

663ndash671Koopmans T (1950) Statistical Inference in Dynamic Economic Models Cowles Commission Mono-

graph No 10 WileyKuersteiner G (2008) GrangerndashSims causality In S Durlauf and L Blume (Eds) The New Palgrave

Dictionary of Economics 2nd edition Palgrave MacmillanLechner M (2001) Identification and estimation of causal effect of multiple treatments under con-

ditional independence assumption In M Lechner F Pfeiller (eds) Econometric Evaluations ofLabour Market Policies Hildelberg Physical 43ndash58

Lechner M (2011) The relation of different concepts of causality used in time series and microecono-metrics Econometric Reviews 30 109ndash127

Lin C-C amp S Ng (2012) Estimation of panel data models with parameter heterogeneity when groupmembership is unknown Journal of Econometric Methods 1 42ndash55

Lindsay B (1995) Mixture Models Theory Geometry and Applications Hayward Institute for Math-ematical Statistics

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

288 XUN LU ET AL

Linton O amp P Gozalo (2014) Testing conditional independence restrictions Econometric Reviews33 523ndash552

Lu X amp L Su (2014) Determining the number of groups in latent panel structures Working paperSchool of Economics Singapore Management University

Matzkin R (2003) Nonparametric estimation of nonadditive random functions Econometrica 711339ndash1375

Matzkin R (2007) Nonparametric identification In JJ Heckman and EE Leamer (eds) The Hand-book of Econometrics 6B North-Holland

Mclachlan G amp K Basford (1988) Mixture Models Inference and Applications to Clustering MarcelDekker

Nair-Reichert U amp D Weinhold (2001) Causality tests for cross-country panels a new look at FDIand economic growth in developing countries Oxford Bulletin of Economics and Statistics 63153-171

Pearl J (2009) Causality Models Reasoning and Inference (2nd edition) Cambridge UniversityPress

Pearl J (2015) Trygve Haavelmo and the emergence of causal calculus Econometric Theory 31152ndash179

Rosenbaum P amp D Rubin (1983) The central role of the propensity score in observational studies forcausal effects Biometrika 70 41ndash55

Rubin D (1974) Estimating causal effects of treatments in randomized and non-randomized studiesJournal of Educational Psychology 66 688ndash701

Rubin D (2004) Direct and indirect causal effects via potential outcomes Scandinavian Journal ofStatistics 31 161ndash170

Sarafidis V amp N Weber (2015) A partially heterogeneous framework for analyzing panel data OxfordBulletin of Economics and Statistics 77 274ndash296

Sims C (1972) Money income and causality American Economic Review 62 540ndash552Song K (2009) Testing conditional independence via Rosenblatt transforms Annals of Statistics 37

4011ndash4015Strotz R amp H Wold (1960) Recursive vs nonrecursive systems An attempt at synthesis Economet-

rica 28 417ndash427Su L Z Shi amp PCB Phillips (2014) Identifying latent structures in panel data Working paper

Department of Economics Yale UniversitySu L amp H White (2007) A Consistent characteristic function-based test for conditional indepen-

dence Journal of Econometrics 141 807ndash834Su L amp H White (2008) A nonparametric Hellinger metric test for conditional independence Econo-

metric Theory 24 829ndash864Su L amp H White (2014) Testing conditional independence via empirical likelihood Journal of

Econometrics 182 27ndash44Sun Y (2005) Estimation and inference in panel structure models Working paper Department of

Economics UCSDWhite H amp K Chalak (2009) Settable systems An extension of Pearlrsquos causal model with optimiza-

tion equilibrium and learning Journal of Machine Learning Research 10 1759ndash1799White H amp K Chalak (2010) Testing a conditional form of exogeneity Economics Letters 109

88ndash90White H amp K Chalak (2013) Identification and identification failure for treatment effects using

structural systems Econometric Reviews 32 273ndash317White H K Chalak amp X Lu (2011) Linking Granger causality and the Pearl causal model with

settable systems Journal of Machine Learning Research Workshop and Conference Proceedings12 1ndash29

White H amp P Kennedy (2009) Retrospective estimation of causal effects through time In J Castleamp N Shephard (eds) The Methodology and Practice of Econometrics A Festschrift in Honour ofDavid F Hendry pp 59ndash87 Oxford University Press

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 289

White H amp X Lu (2010) Granger causality and dynamic structural systems Journal of FinancialEconometrics 8 193ndash243

White H amp X Lu (2011) Causal diagrams for treatment effect estimation with application to selectionof efficient covariates Review of Economics and Statistics 93 1453ndash1459

Wooldridge J (2005) Simple solutions to the initial conditions problem for dynamic nonlinear paneldata models with unobserved heterogeneity Journal of Applied Econometrics 20 39ndash54

Zellner A (1979) Causality and econometrics In K Brunner and AH Meltzer (eds) Three Aspectsof Policy and Policymaking Carnegie-Rochester Conference Series Vol 10 North-Holland 9ndash54

APPENDIX

Proof of Theorem 31 To show D perp U | X we establish that f (u | d x) = f (u | x)for all (ud x) We have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

= f (u x)

f (x)= f (u | x)

where the first equality in the second line holds since A2(i) (ie Dg | Xg sim D | X)implies fg(d x) = f (d | x) fg(x) say and the second holds by A2(i i) fg(u | d x) =fg(u | x) n

Proof of Theorem 32 Note that

f (u | x) = f (u x)

f (x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

We also have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | d x) fg(x)sum

gisinG pg fg(x)

The first equality in the second line follows from A2(i) Then f (u | d x) = f (u | x) forall (ud x) if and only ifsumgisinG

pg fg(u | d x) fg(x) =sumgisinG

pg fg(u | x) fg(x) for all (ud x)

By assumption (Ug | Dg Xg) g isinG is regular It follows immediately that for all g isinGDg perp Ug | Xg n

Proof of Proposition 33 Take any g isin G and let j belong to group Jg Dg rArrSYg means that Yj = r(Zj Uj bj ) Thus Uj perp Dj | X j implies that Yj perp Dj | X j byDawid (1979 Lemmas 41 and 42) This holds for all g isin G Applying Theorem 31gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

290 XUN LU ET AL

Proof of Theorem 34 We prove this by showing that Y perp D | X implies that Dg rArrSYg for all g isinG Given the assumed regularity Theorem 32 ensures that Y perp D | X impliesthat Yg perp Dg | Xg for all g isin G

Given (i) eq(33) and A2(ii) imply that for all j in group JgE

(Yj |X j = x Dj = d

) = r1(d z bj

)+ E[r2

(zUj bj

) | Dj = d X j = x]

= r1(d z bj

)+ E[r2

(zUj bj

) | X j = x]

Clearly r1(d z bj

)is constant in d if and only if E

(Yj |Dj = d X j = x

)is constant in

d Yj perp Dj | X j implies that E(Yj |Dj = d X j = x

)is constant in d which then implies

that r1(d z bj

)is constant in d Thus Dj rArrS Yj so Dg rArrS Yg

Given (i i) let ky = 1 without loss of generality Then by A2(ii) for all j in group JgFUj (u | x) = E

[1Uj le u

∣∣ X j = x]

= E[

1r(d zUj bj

) le r(d zu bj

)∣∣ X j = x]

= E[

1r(Dj Zj Uj bj

) le r(d zu bj

)∣∣ Dj = d X j = x]

= E[

1Yj le r

(d zu bj

)∣∣ Dj = d X j = x]

= Fj(r(d zu bj

) | d x)

By strict monotonicity of Fj (y | d x) in y

r(d zu bj

) = Fminus1j

(FUj (u | x) | d x

)

Now Yj perp Dj | X j implies that Fminus1j

(FUj (u | x) | d x

)is constant in d so r

(d zu bj

)is

also a constant function in d ie Dj rArrS Yj so Dg rArrS Yg As either (i) or (i i) holds for each g isin G we have Dg rArrS Yg for all g isin G n

Proof of Theorem 35 To show that A1 - A2 and the assumed regularity for Ygtogether with Dg rArrS(Xg) Yg for some g isin G imply D perp Y | X we let j in group Jgg isin G be such that Dj rArrS(X j ) Yj A2(i i) ensures that Dj perp Uj | X j so

P[Yj le y | Dj = d X j = x] =int

1r(d zu bj ) le y

d Fj (u | d x)

=int

1r(d zu bj ) le y

d Fj (u | x)

By assumption Dj rArrS(X j ) Yj so there exists ylowast isin supp(Yj ) such that there is no mea-

surable mapping f jylowast for whichint

1r(d zu bj ) le ylowastd Fj (u | d x) = f jylowast(x) ae-x Specifically this rules out the possibility thatint

1r(d zu bj ) le ylowast

d Fj (u | X j ) = f jylowast(X j ) equiv P[Yj le ylowast | X j ] as

Thus there exists ylowast isin supp(Yj ) such that

P[Yj le ylowast | Dj = d X j = x

] = P[Yj le ylowast | X j = x

]

for x in a set of positive probability so Yj perp Dj | X j Given the regularity assumed for YgTheorem 32 (for Yg) gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 291

Proof of Theorem 41 The proof is analogous to that of Theorem 31 n

Proof of Proposition 42 The proof is analogous to that of Proposition 33 n

Proof of Proposition 43 Under B3 Dgt rArrS Ygt and Dgtminus1 rArrS Ygtminus1 forallg isinGt cupGtminus1 imply that for j isin Jg there exist two measurable functions r1 and r2 such that

Yjt = r1(Uj0 bj0)+ r2(Y jtminus1 Zjt Ujt bjt

) j = 1 N t = 1

Hence

Yjt minusYjtminus1 = r2(Y jtminus1 Zjt Ujt bjt

)minus r2(Y jtminus2 Zjtminus1Ujtminus1 bjtminus1

)

At the sample level this means that

Yt minusYtminus1 = r2(Y tminus1 Zt Ut Bt

)minus r2(Y tminus2 Ztminus1Utminus1 Btminus1

)

Then B4 implies that

Dt perp (Xt Y tminus1Ut Utminus1 Bt Btminus1) | Xt Y tminus1

which further implies that Dt perp (Yt minusYtminus1) | Xt Y tminus1by Dawid (1979 Lemmas 41 and42) as (Yt minus Ytminus1) is a function of (Xt Y tminus1Ut Utminus1 Bt Btminus1) Then we have Dt perpYt | Xt Y tminus1 as Ytminus1 is a component of Y tminus1 n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

Page 10: 33 GRANGER CAUSALITY AND STRUCTURAL CAUSALITY IN … Lu_Su... · 2017. 2. 7. · 264 XUNLUETAL. natural experiments. The goal of this paper is to establish the analogous equiv-alence

272 XUN LU ET AL

intervention d rarr d +1 is

ylowastj minus yj = (

bj0 + (d +1)bj1 + zprimebj2 +u)minus (

bj0 +d bj1 + zprimebj2 +u) = bj1

For unit j this effect is fixed ie it is the same constant for all (zu) Becausebj can differ across units this permits effect heterogeneity

Alternatively if Yj = bj0 + Zj Dprimej bj1 +Uj with scalar Zj then the effect of

d rarr dlowast is

ylowastj minus yj = (

bj0 + zdlowastprimebj1 +u)minus (

bj0 + zd primebj1 +u) = z

(dlowast minusd

)primebj1

Here the effect of Dj on Yj depends on z but not u In this case bj1 determinesthe effect of Dj on Yj together with Zj When an effect depends on a variable(ie an element of Zj or Uj ) it is standard (especially in the epidemiologicalliterature) to call that variable an effect modifier

For simplicity we call ylowastj minus yj the effect of d rarr dlowast Chalak and White (2012)

discuss indirect direct and total effects in structural equations In the potentialoutcome framework Rubin (2004) discusses ldquodirectrdquo and ldquoindirectrdquo casual effectusing principal stratification

We distinguish the structural causal effect from various average causal effectssuch as E[r(dlowast Zj Uj bj ) minus r(d Zj Uj bj )] It is clear that structural non-causality implies zero average causal effects while the converse is not necessarilytrue

To describe identified effects in samples from heterogeneous populationswe define population groups Jg g = 1 as collections of populationunits having identical bj and identical distributions of (Dj Uj ) | X j whereX j = (Z prime

j W primej )

prime We define G = 1 As the population is finite so is thenumber of groups le N We define Ng = j isinJg where middot is the cardinal-ity of the indicated set and let pg = NgN be the proportion of population unitsbelonging to group Jg The bj rsquos by themselves need not define groups as thedistributions of (Dj Uj ) | X j may differ for units with identical bj We call cross-group variation in the distributions of (Dj Uj ) | X j distributional heterogeneityto distinguish this from structural heterogeneity that refers to cross-group varia-tion in bj

Note that because groups are defined by unobservable constants bj and dis-tributions involving the unobservable Uj in general we do not know for surewhether two units belong to the same group Interestingly in the related litera-ture Bester and Hansen (2016) consider estimating grouped effects in panel datamodels when each individualrsquos group identity is known and Su Shi and Phillips(2014) consider identifying latent group structures in panel data models via a vari-ant of Lasso Nevertheless both groups of researchers have only focused on thestructural heterogeneity in the panel framework

Given A1 it follows that (Yj Dj ) | X j is identically distributed for all unitsj in group Jg As a convenient shorthand for j isin Jg we write (Yj Dj ) | X j sim(Yg Dg) | Xg where A sim B denotes that A is distributed as B

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 273

Typically we do not observe an entire population Instead we observe observa-tions sampled from the population in some way For simplicity and concretenesswe consider simple random sampling At the sample level we write4

Y = r (D Z U B)

where B is the randomly sampled bj Note that the randomness of B arises solelyfrom the sampling process We write X = (Z primeW prime)prime and distinguish X from thecauses D by calling X a vector of covariates

For any random vector X we let f (x) and F (x) denote the joint prob-ability density function (PDF) and cumulative distribution function (CDF)respectively For any two random vectors X and Y we use f (y|x) and F (y|x)to denote the conditional PDF and CDF of Y given X = x respectively Wealso use subscript j and g to denote individual j and group g respectivelyFor example Fg(ud x) and fg (u|d x) denotes the CDF of (Ug Dg Xg)and the conditional PDF of Ug given Dg Xg for group g respectively Notethat Fg(ud x) = Nminus1

g

sumjisinJg Fj (ud x) which defines the mixture CDF of

(Ug Dg Xg)

32 Gminuscausality a first encounterWe now define Gminuscausality for cross-section data based on Grangerrsquos philoso-phy of nonpredictability Holland (1986 p 957) considers a special case whereD is randomized and applies Gminuscausality to cross-section data5 As in Hollandwe define Gminusnon-causality as a conditional independent statement FollowingDawid (1979) we write X perp Y | Z to denote that X and Y are independentgiven Z and X perp Y | Z if X and Y are not independent given Z Translat-ing Granger and Newboldrsquos (1986 p 221) definition to the cross-section contextgives

DEFINITION 33 (GminusCausality) Let Y Q and S be random vectors IfQ perp Y | S then Q does not Gminuscause Y with respect to (wrt) S OtherwiseQ Gminuscauses Y wrt S

Here Y Q and S can be any random vectors and this notion has no structuralcontent When S is a constant we have the simplest form of Gminusnon-causalityindependence Correlation is an example of Gminuscausality

To give Gminuscausality structural meaning we impose A1 assume that D obeysa suitable exogeneity condition and take Q = D and S = X The exogeneityassumed for D ensures identification for various measures of its effects and canoften be structurally justified It also turns out to be ideally suited to endowingGminuscausality with structural meaning

To see how this works in a simple setting consider the homogeneous case withD being a binary scalar in which average treatment effects (AT E) and averagetreatment effects on the treated (AT T ) are often discussed (see eg Rubin 1974

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

274 XUN LU ET AL

Hahn 1998 Hirano Imbens and Ridder 2003 and Angrist and Pischke 2009)Define the covariate-conditioned effect of treatment on the treated as

AT TX = E (Y (1)minusY (0) | D = 1 X)

where Y (1) is the potential response to treatment and Y (0) is the potentialresponse in the absence of treatment Using A1 we write Y (1) = r(1 Z U b0)and Y (0) = r(0 Z U b0) AT TX thus has a clear structural interpretation Fromthis we can construct AT T

AT T = E (Y (1)minusY (0) | D = 1) = E(AT TX | D = 1)

Note that here we define AT T and AT TX based on our structural equation r Nevertheless they can be defined without a structural model ie based on thetwo potential outcomes Y (0) and Y (1) (see eg Rubin 1974 and Angrist andPischke 2009)

To identify AT TX it suffices that D perp Y (0) | X Given A1 it suffices forthis that D perp U | X as is readily verified This conditional exogeneity is acommon identifying assumption in cross-sections6 Classical strict exogeneity((D Z) perp U ) is sufficient but not necessary for this Using D perp U | X in thesecond line below we have

AT TX = E [r(1 Z U b0) | D = 1 X ]minus E [r(0 Z U b0) | D = 1 X ]

= E [r(1 Z U b0) | D = 1 X ]minus E [r(0 Z U b0) | D = 0 X ]

= E(Y | D = 1 X)minus E(Y | D = 0 X) = μ(1 X)minusμ(0 X)

Thus AT TX can be expressed in terms of the distribution of observables (hereμ(1 X) and μ(0 X)) so this effect is identified (eg Hurwicz 1950) Simi-larly AT T is identified The identification result based on conditional exogeneityhas been extensively discussed in the treatment effect literature (see eg Rubin1974 Holland 1986 and Angrist and Pischke 2009)

To see the structural meaning for Gminuscausality observe that E(Y | D = 1 X)minusE(Y | D = 0 X) = 0 as is another way to write E(Y | D X) = E(Y | X) asThus in this context Gminusnon-causality in mean of D for Y wrt X is equivalentto AT TX = 0 as

Note that even for binary treatments Gminuscausality in mean does not tell thefull story as Gminuscausality in distribution can hold even when AT TX or AT Tvanishes For example suppose

Y = U D

with D perp U E(U ) = 0 E(U 2) lt infin and E(D2) lt infin Then D structurallycauses Y its effect on Y is U a random effect These effects may be posi-tive or negative however they average out as AT T = 07 Here and as we willshow generally Gminusnon-causality essentially has the interpretation that Y is not

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 275

structurally caused by D under the key conditional exogeneity assumption Inthis example we have Gminuscausality (D perp Y ) as E(Y 2 | D) = D2 E(U 2) TestingGminusnon-causality will detect the structural causality here in contrast to testingAT T = 0 or equivalently Gminusnon-causality in mean of D for Y

When D is nonbinary A1 together with suitable exogeneity conditionssimilarly suffices to identify certain effects that give structural meaning toGminuscausality This makes it generally possible to test for structural causality bytesting for Gminuscausality

With heterogeneity matters become somewhat more complicated In particularin the binary case μ(1 X)minusμ(0 X) is no longer AT TX Instead with suitableexogeneity (Assumption A2 below) we have that for each x μ(1 x)minusμ(0 x)recovers a blended effect

μ(1 x)minusμ(0 x) =sumg=1

ζg|x AT Tg|x

where the ζg|x rsquos are nonnegative weights adding to one and for all j isin JgAT Tg|x = E[Yj (1) minus Yj (0) | Dj = 1 X j = x] is the group-specific covariate-conditioned average effect of treatment on the treated Here Yj (1) is the poten-tial response to treatment and Yj (0) is the potential response in the absence oftreatment

33 Exogeneity and effect identification with heterogeneity

With heterogeneity an identifying exogeneity condition relevant for analyzingGminuscausality is

Assumption A2 (Heterogeneous Cross-Section Exogeneity) For all g isin G(i) Dg | Xg sim D | X and (i i) Dg perp Ug | Xg

Observe that this imposes structures for all groups namely all g isin G Althoughthis condition is not the weakest possible (see Chalak and White (2012) and thediscussion below) A2 generally suffices to identify effects and link Gminuscausalityand structural causality A2(i) restricts the allowed distributional heterogeneityfor all g isin G the conditional distributions of Dg given Xg are assumed identicalA2(i i) imposes conditional exogeneity for all groups Below we discuss thisfurther

THEOREM 31 Suppose A1 - A2(i) hold If A2(i i) holds then D perp U | X

It is also of interest to know whether the converse holds A strict conversedoes not hold as certain fortuitous cancellations in population-group condi-tional distributions can yield sample conditional exogeneity without population-group conditional exogeneity Nevertheless the converse does hold under a mildregularity condition ruling out exceptional cases We introduce the followingdefinition

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

276 XUN LU ET AL

DEFINITION 34 Ug | Dg Xg g isin G is regular ifsumgisinG

pg fg(u | d x) fg(x) =sumgisinG

pg fg(u | x) fg(x) for all (ud x) (32)

implies Ug perp Dg | Xg for all g isin GWhen G = 1 Ug | Dg Xg g isin G is necessarily regular To understand whatregularity rules out consider the next simplest case with G = 2 and Xg absentThen Ug | Dg g isin G fails to be regular if and only if U2 perp D2 (say) and

f1(u | d) = f1(u)minus 1minus p1

p1[ f2(u | d)minus f2(u)] for all (ud)

where the two groups have been re-indexed for convenience Suppose that p1f1(u) f2(u) and f2(u | d) are arbitrary Then it is easily arranged that f1(u | d)can be negative for (ud) in a set of positive probability so f1(u | d) does notdefine a conditional density and Ug | Dg g isin G is regular after all If this f1(u |d) is nevertheless a conditional density it is clearly highly special as it ensuresthat eq(32) holds for the given p1 and the functions defined by f1(u) f2(u) andf2(u | d) (all of which are typically unknown) but not necessarily otherwise

In the general case Ug | Dg Xg g isin G fails to be regular if and only ifUg perp Dg | Xg for some g isin G and

f1(u | d x) = f1(u | x)minus (p1 f1(x))minus1sumg=2

pg [ fg(u | d x) minus fg(u | x)] fg(x)

times for all (ud x)

where we again re-index the groups As before this is clearly a very special pop-ulation configuration ruling out such cases by imposing regularity is a weak re-striction

The converse result is

THEOREM 32 Suppose A1 - A2(i) hold and suppose Ug | Dg Xgg isin Gis regular If D perp U | X then A2(i i) holds

Theorems 31-32 also hold with Yg | Dg Xg g isinG regular D perp U | X replacedby D perp Y | X and A2(i i) replaced by Dg perp Yg | Xg g isin G This result plays acentral role in linking structural causality and Gminuscausality

34 Linking structural causality and GminuscausalityAs we have seen A1 imposes structures where causal effects are well definedA2 permits recovering versions of these As we now prove this gives structuralmeaning to Gminuscausality generally Recall that the bj rsquos are identical in group Jgthus the same structural causality relations hold for all j in a given group Wewrite Dg rArrS Yg when Dj rArrS Yj for (all) j in group Jg Our next result showsthat structural noncausality implies Gminusnon-causality

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 277

PROPOSITION 33 Let A1 - A2 hold Then structural noncausality(Dg rArrS Yg forallg isinG) implies that D does not Gminuscause Y wrt X ie D perp Y | X

This result is intuitive when structural effects are well defined and identifiableGminusnon-causality follows from structural noncausality Holland (1986 p 958)gives a similar result for the random experiment case Our Proposition 33 jus-tifies using tests of Gminusnoncausality to test structural noncausality if we rejectGminusnoncausality we must reject structural noncausality

As WL show structural causality does not imply Gminuscausality Without furtherassumptions the concepts are not equivalent Nevertheless if as is commonlyassumed in the literature (see WL for discussion) r obeys separability betweenobservable causes D and unobservable causes U or r obeys a specific form ofmonotonicity in U then with suitable regularity structural causality does implyGminuscausality We have

THEOREM 34 Given A1 - A2 suppose Yg | Dg Xgg isin G is regularSuppose further that for all g isin G either (i) or (i i) holds

(i) For all j isin Jg for unknown measurable functions r1 and r2

Yj = r1(Dj Zj bj

)+ r2(Zj Uj bj

) (33)

(ii) For all j isinJg for = 1 ky r(d zu bj ) = r0(d zu bj ) for scalaru where r0(d z middot bj ) is strictly monotone increasing for each admissi-ble (d z) and Fj(y | d x) = P[Yj le y | Dj = d X j = x] is strictlymonotone in y for each admissible (d x)

Then structural causality (for some g isin G Dj rArrS Yj for (all) j in Jg) impliesthat D Gminuscauses Y wrt X ie D perp Y | X

Kasy (2011) gives a discussion of structures satisfying (i i) Even without sep-arability or monotonicity there is an equivalence between Gminuscausality and astronger notion of structural causality that handles certain exceptional cases wherethe causal structure and the conditional distribution of (Uj Dj ) given X j inter-act in just the right way to hide the structural causality WL call this strongernotion structural causality with positive probability and provide discussions Welet Yj = supp(Yj ) and Xj = supp(X j )

DEFINITION 35 (Structural Causality with Positive Probability) SupposeA1 holds and let j be given Suppose that for each y isin Yj there exists a mea-surable function f jy Xj rarr [01] such thatint

1r(Dj Zj u bj

)lt y

d Fj

(u | X j

) = f jy(X j

)as (34)

where Fj (u | X j ) = E[1Uj le u | X j ] Then Dj does not structurally cause Yj

almost surely (as) wrt X j (Dj rArrS(X j ) Yj ) Otherwise Dj structurally causesYj with positive probability (wpp) wrt X j

(Dj rArrS(X j ) Yj

)

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

278 XUN LU ET AL

By the definition of a group the same structural causality wpp relations holdfor all j in a given group We thus write Dg rArrS(Xg) Yg when Dj rArrS(X j ) Yj holdsfor all j in group Jg

Below is an example in WL in which Dj structurally causes Yj while Dj doesnot structurally cause Yj as Consider the structural equation r

Yj = r(Dj Uj

) = DjradicD2

j +1Uj1 + 1radic

D2j +1

Uj2

where Uj equiv (Uj1Uj2) and Dj are all N (01) random variables and Dj U1 j andU2 j are mutually independent For simplicity there is no bj Zj or X j It is clearthat Dj structurally causes Yj here Nevertheless Dj does not structurally causeYj as To see this note that Yj and Dj are independent Thus the LHS of eq(34)becomesint

1r(Dj u1u2

)lt y

d Fj (u1u2)

=int

1r(Dj u1u2

)lt y

d Fj

(u1u2|Dj

) = Pr[Yj lt y|Dj

] = (y)

where Fj (middot) and Fj(middot|Dj

)denote the CDF of Uj and the conditional CDF of Uj

given Dj and is the standard normal CDF Thus the RHS does not depend onDj ie Dj does not structurally cause Yj as

Theorem 35 below gives a result linking Gminuscausality and structural causalityas

THEOREM 35 Suppose A1 - A2 hold Suppose also that Yg | Dg Xgg isinG is regular If Dj structurally causes Yj wpp wrt X j (for some g isin GDj rArrS(X j ) Yj for all j in Jg) then D Gminuscauses Y wrt X ie D perp Y | X

Together Proposition 33 and Theorem 35 establish that cross-sectionGminuscausality and structural causality are essentially equivalent under the key con-ditional exogeneity assumption We say ldquoessentiallyrdquo as Theorem 35 rules outthe two ways that Gminusnon-causality can mask structural causality The first arisesfrom subtle interactions between the causal structure and the conditional distribu-tion of (Uj Dj ) given X j The second arises under heterogeneity from delicatecancellations among the conditional distributions of Yj given Dj and X j acrossgroups These exceptional possibilities only trivially mitigate the power of testsfor Gminuscausality as tests for structural causality Errors of interpretation and in-ference need not result as long as these exceptions are recognized

35 Identification and identification failure

To gain further insight into the relation between Gminuscausality and effect identi-fication we now undertake a deeper analysis of Gminuscausality in mean For this

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 279

let qg(d x) = fg(d x)sumhisinG (ph fh(d x)) Under A1 the law of iterated ex-

pectations gives

E (Y | D = d X = x) =sumgisinG

pg qg(d x)

intr(d zu bg) d Fg(u | d x)

For concreteness consider identifying average marginal effects for a continu-ous treatment d To allow both discrete and continuous Ug let νg(u | d x) definea σminusfinite measure dominating that defined by Fg(u | d x) and let fg(u | d x)define the associated Radon-Nikodym density for Ug given (Dg Xg) Assum-ing differentiability and mild regularity derivations parallel to those of White andChalak (2013) give

part

partdE(Y | D = d X = x)

=sumgisinG

pg qg(d x)

intpart

partdr(d zu bg) fg(u | d x) dνg(u | d x)

+sumgisinG

pg qg(d x)

int[r(d zu bg)

part

partdln fg(u | d x)]

times fg(u | d x) dνg(u | d x)

+sumgisinG

pgpart

partdqg(d x)

intr(d zu bg) fg(u | d x)dνg(u | d x)

After some manipulation this regression derivative can be written as

part

partdE (Y | D = d X = x) = β(d x)+ δ1(d x)+ δ2(d x) (35)

The first of the three terms on the right is a weighted average marginal effect

β(d x) =sumgisinG

pg qg(d x)βg(d x) where βg(d x)

= E[(partpartd)r(Dg ZgUg bg) | Dg = d Xg = x

]is the covariate-conditioned average marginal effect of Dj on Yj at (Dj = d X j =x) for j isin Jg g isin G Thus when the other two terms vanish the regressionderivative identifies the structurally meaningful weighted effect β(d x) andwhen β(d x) is nonzero for (d x) in a set of positive probability we haveGminuscausality in mean of D for Y wrt X When the other two terms do notvanish we generally have identification failure and there is no necessary linkbetween Gminuscausality and structural causality

The second term is a pure endogeneity bias

δ1(d x) =sumgisinG

pg qg(d x) E(εg Sg | Dg = d Xg = x

)

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

280 XUN LU ET AL

where εg = Ygminus E(Yg | Dg = d Xg = x

)is the regression residual for the given

group and Sg = (partpartd) ln fg(Ug | Dg Xg) is the exogeneity score of White andChalak (2013) Observe that E(εg Sg | Dg = d Xg = x) represents a specificform of omitted variable bias where the exogeneity score is the omitted variableA sufficient (but not necessary) condition for this bias to vanish is A2(i i) Dg perpUg | Xg for all g isin G

The third term is a pure heterogeneity bias

δ2(d x) =sumgisinG

pgpart

partdqg(d x) E

(Yg | Dg = d Xg = x

)

due to heterogeneity of fg(d | x) across members of G8 A sufficient (but notnecessary) condition for this to vanish is A2(i) Dg | Xg sim D | X for all g isin G

Thus A2 ensures that the regression derivative identifies the weighted effect

β(d x) = βlowast(d x) =sumgisinG

pg qlowastg(x)βlowast

g(d x)

where qlowastg(x) = fg(x)

sumh (ph fh(x)) and βlowast

g(d x) = int(partpartd)r(d zu bg)

d Fg(u | x) This identification provides the link between Gminuscausality in meanand structural causality

Although A2 is not necessary for effect identification cases where identifica-tion holds in the absence of A2 are quite special analogous to failures of reg-ularity Thus for practical purposes A2 can be viewed as playing the key rolein identifying effects of interest and thereby linking Gminuscausality and structuralcausality

Further as Proposition 61 and Corollary 62 of WL show in the absence ofstructural causality Gminuscausality is essentially equivalent to exogeneity failureHere this is reflected in the fact that when βg(d x) vanishes for all g A1 andmild regularity conditions give

part

partdE (Y | D = d X = x) = δ1(d x)+ δ2(d x)

Thus with structural noncausality as Gminuscausality in mean implies the fail-ure of A2(i) A2(i i) or both conversely failure of A2 essentially ensuresGminuscausality in mean

36 Testing for Gminuscausality and structural causality in cross-sectiondata

As just seen the hypothesis of Gminusnon-causality and thus of structural noncausal-ity under the conditional exogeneity assumption is a specific conditional inde-pendence In the literature there are many conditional independence tests thatapply to IID data (eg see Delgado and Gonzalez-Manteiga 2001 Fernandes and

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 281

Flores 2001 Su and White 2007 2008 2014 Song 2009 Linton and Gozalo2014 Huang Sun and White 2016) These methods can be computationallychallenging as they are nonparametric Angrist and Kuersteiner (2004 2011)develop a semi-parametric test for conditional independence Based on the factthat Y perp D | X implies ψ1(Y ) perp ψ2(D) | X for any measurable vector functions(ψ1ψ2) WL propose several convenient regression-based tests

4 STRUCTURAL CAUSALITY AND GminusCAUSALITY IN PANEL DATAThe literature contains considerable discussion about Gminuscausality in the paneldata setting Nevertheless the focus is mainly on linear or parametric modelsFor example Chamberlain (1984) discusses Gminuscausality ldquoconditional on un-observables [lsquofixed effectrsquo]rdquo in a linear model and a logit model Holtz-EakinNewey and Rosen (1988) consider a traditional linear vector autoregression(VAR) and use GMM to test Gminuscausality Nair-Reichert and Weinhold (2001)discuss Gminuscausality in a dynamic mixed fixed- and random-coefficients linearmodel allowing heterogeneity across individuals Dumitrescu and Hurlin (2012)test for Gminuscausality in a linear dynamic panel model with fixed coefficients thatvary across individuals There is also a growing literature on nonlinear and non-separable panel models (see eg Hoderlein and White (2012) and the referencestherein) We are not aware that Gminuscausality has been discussed in such modelsso far

We emphasize that in panel data Gminuscausality is also entirely based on pre-dictability and has no causal interpretation In this section we link Gminuscausalityand structural causality in general panel structures Since the static panel case issimilar to the cross-sectional case we focus on a dynamic data generating processTo simplify notation for given y isin N+ we let Y jtminus1 denote the finite historyY jtminus1 = (Yjtminusy Yjtminus1)

Assumption B1 (Panel Population DGP) Let (F P) be a complete prob-ability space let N isin N+ and let y isin N+ For the members of a populationj isin 1 N let the random vectors Yjt t = 12 be structurally determinedby a triangular system as

Yjt = r(Y jtminus1 Djt Zjt Ujt bjt

) (41)

where r is an unknown measurable ky times 1 function ky isin N+ Yjτ (τ = 1 minusy 0) Djt (kd times1kd isin N+) Zjt (kz times1 kz isin N) and Ujt (ku times1 ku isin N)are vectors of nondegenerate random variables on (F P) and bjt is a non-random real vector of dimension kb isin N Suppose also that Wjt t = 12 arerandom vectors on (F P) with dimension kw isin N The triangular structureis such that for t = 12 neither Yjsinfins=1minusy

nor Djsinfins=1 structurally deter-mine Wjt Yjsinfins=1minusy

does not structurally determine Djt Zjt or Ujt andDjsinfins=1 does not structurally determine Zjt or Ujt

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

282 XUN LU ET AL

The interpretation of structural eq(41) is the same as that for eq(31) ex-cept that here we have multiple time periods t = 12 The elements of(Yjt Djt Zjt Wjt Ujt

)and bjt rsquos can contain both time-varying and time-

invariant elements For now we do not distinguish these The elements of(Djt Zjt Wjt Ujt ) can contain lags of underlying variables By permittingonly finite histories Y jtminus1 Djt Zjt we restrict attention to Markov-type datagenerating processes this greatly simplifies the analysis and corresponds to thestructures usually considered in practice

As in the cross-section case we can have both structural heterogeneity anddistributional heterogeneity We specify the relevant structural and distributionalheterogeneities below

We again impose random sampling from the population For simplicity weassume all time periods can be observed for every individual so we observe abalanced panel Similar to the cross-section case we write

Yt = r (Y tminus1 Dt Zt Ut Bt ) t = 1 T

by suppressing the cross-sectional subscript i We assume that we observe data(Yt Dt Zt Wt ) t = 1 T and relevant lags prior to t = 1

41 Linking structural causality and Gminuscausality in dynamic panelsWe first define structural causality for the dynamic panel structure Let Djt be thesupport of Djt

DEFINITION 41 (Structural Causality in Dynamic Panels) Let j and t begiven If the function r(y jtminus1 middot zjt ujt bjt ) Djt rarr R

ky is constant on Djt

for all admissible y jtminus1 zjt and ujt then Djt does not structurally cause Yjt and we write Djt rArrS Yjt Otherwise Djt structurally causes Yjt and we writeDjt rArrS Yjt

We let X jt = (Z primejt W prime

jt )prime denote the covariates at time t WL consider

retrospective conditional exogeneity and retrospective Gminuscausality (White andKennedy 2009) We also allow this For this we letX jt denote a covariate his-tory that may contain X jt and lags or leads of X jt as in Wooldridge (2005 p 41)or WL ie for some m isin N+

X jt = SX(X jtminusm X jt X jt+m

) t = 12

where SX is a given selection matrix We assume

X jt X jtminusm subeX jt and

allow X jt also to contain leads of X jt When Xjt contains leads we have theretrospective case as in WL We denote randomly sampled covariates X t Forsimplicity we also assume that all needed elements of X t are observable Tem-poral precedence plays an important role in panel data In particular comparedwith the cross-sectional case we now have a richer set of covariates (includingboth leads and lags) to choose from

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 283

To provide a link between structural causality and Gminuscausality for dynamicpanels we condition on not only covariates X t but also lags of Yt For thiswe denote Y jtminus1 = (Yjtminusmy Yjtminus1) where my isin N+ and my ge y ie

Y jtminus1 sube Y jtminus1 Similarly we let Y tminus1 be the randomly sampled Y jtminus1Similar to the cross-section case for each t we define population group Jgt

g isin 1 t as a collection of population units having identical bjt and iden-tical distribution of (Ujt Djt ) |X jt Y jtminus1 We also let Gt = 1 t Forgiven t we write Dgt rArrS Ygt to denote Djt rArrS Yjt for all j in group Jgt g isin Gt

Next we impose an exogeneity assumption analogous to A2

Assumption B2 (Heterogeneous Dynamic Panel Exogeneity) Let t be givenFor all g isin Gt (i) Dgt |Xgt Y gtminus1 sim Dt |X t Y tminus1 and (i i) Dgt perp Ugt |Xgt Y gtminus1

Here as before for all g isin Gt the conditional distributions of Dgt given(Xgt Y gtminus1) are assumed to be identical and we write the common distributionas Dt |X t Y tminus1

The analog of Theorem 31 relating population and sample conditional exo-geneity is

THEOREM 41 Suppose B1ndashB2(i) hold and let t be given If B2(i i) holdsthen Dt perp Ut |X t Y tminus1

Proposition 42 provides a link between structural noncausality and Gminusnon-causality

PROPOSITION 42 Suppose that B1 and B2 hold Let t be given Thenstructural noncausality (Dgt rArrS Ygt forallg isin Gt ) implies that Dt does notGminuscause Yt wrtX t Y tminus1 ie Dt perp Yt |X t Y tminus1

We can also define structural causality with positive probability for dynamicstructures and show that structural causality and Gminuscausality are ldquoessentiallyrdquoequivalent under the key conditional exogeneity assumption For brevity we omitthe details

So far we have not carefully distinguished between time-varying and time-invariant elements of unobservable bjt and Ujt Below we briefly discuss thecase where we allow time-invariant components of Ujt and the randomly sampledbjt to be arbitrarily dependent or correlated with the cause of interest Djt For this we denote Ujt equiv (Ujt Uj0) where Ujt is time varying and Uj0time-invariant Similarly we denote bjt equiv (bjt bj0) The randomly sampledUjt Uj0 bjt bj0 are denoted as Ut U0 Bt B0 respectively In princi-ple we want to remove the time-invariant Uj0 and bj0 using eg the firstdifference For this it is convenient to impose a separable assumption on thestructural equation9 We also impose a conditional exogeneity assumption at thesample level

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

284 XUN LU ET AL

Assumption B3 Suppose that B1 holds with

Yjt = r1(Djt Uj0 bj0

)+r2(Y jtminus1 Djt Zjt Ujt bjt

) j = 1 N t = 1

where r1 and r2 are two unknown ky times1 measurable functions

Assumption B4 Suppose that for given t Dt perp (Ut Utminus1 Bt Btminus1

) |Xt Y tminus1

B4 is a sample-level assumption which can be supported by a correspond-ing assumption at the heterogeneous population level Note that in general B4is a strong assumption as we condition on Y tminus1 which is a function of bothDtminus1 and Utminus1 Nevertheless this assumption is plausible under the null of struc-tural noncausality as Y jtminus1 is a constant function of Djtminus1 under the nullOne simple example is that bjt rsquos are constants over j and t and Djt infint=1 perpYjt 0

t=1minusyUjt infint=1X jt infint=1 Under the null of structural noncausality for

all t Y jtminus1 is a function of Yjs0s=1minusy

Ujsts=1X jst

s=1 Then it is easyto show that in this case B4 is satisfied Certainly it will be interesting to relaxB4 but we leave this for future research

PROPOSITION 43 Suppose that B1ndashB4 hold Suppose that Dgt rArrS Ygtand Dgtminus1 rArrS Ygtminus1 forallg isin Gt cupGtminus1 Then Dt perp Yt | Xt Y tminus1

Proposition 43 suggests that when there are time-invariant components in theunobservables we can test for structural noncausality by testing for Gminusnon-causality Dt perp Yt | Xt Y tminus1 under the conditional exogeneity assumption B4

42 Testing for Gminuscausality and structural causality in panel dataAs shown above we can perform a Gminusnon causality test to test for structuralnoncausality Testing Gminusnon-causality in panel data is also simply a conditionalindependence test Here we focus on testing Dt perp Yt | Xt Y tminus1

First assume that the joint distributions of (Dt Yt Xt ) are identical over t In this case we can pool the data and implement a conditional independence testas discussed in Section 36

Second suppose that the joint distributions of (Dt Yt Xt ) are different overtime t In this case for each time period t using the cross-section data we canimplement a conditional independence test for Dt perp Yt | (Xt Y tminus1) and obtain atest statistic sayWt Thus we have T test statistics W1W2 WT and eachone can be used to test for structural noncausality for each t We may also want totest the hypothesis that for all t Djt does not structurally cause Yjt For this wecan construct our test statistic by taking the average of the T test statistics ieW equiv 1

T

sumTt=1Wt as in Im Pesaran and Shin (2003) and Dumitrescu and Hurlin

(2012) Other forms of ldquoaveragesrdquo are possible and special care is needed to takeinto account the dependence structure over time

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 285

5 CONCLUSION

This paper provides direct links between Granger causality and structural causal-ity in cross-section and panel data We extend Granger causality to cross-section and panel data and give a natural definition of structural causality inheterogeneous populations We show that under the key conditional exogeneityassumption Granger causality is essentially equivalent to structural causality incross-section and panel data Similar to the results in White and Lu (2010) fortime-series data our results here should enable researchers to avoid the misuse ofGranger causality and to establish the desired structural causal relation

NOTES

1 Strictly speaking if the conditional distributions are parameterized or treated as infinite-dimensional parameters then distributional heterogeneity may also be thought of as a special caseof structural heterogeneity

2 The classification is certainly overly simplistic We ignore the early literature on the philosophyof causality (for a review see eg Holland 1986 and Hoover 2008) We also ignore the litera-ture on causality discussed in other disciplines notably in machine learning (see eg Pearl 20092015 White and Chalak 2009 and White Chalak and Lu 2011) In particular Heckman and Pinto(2015) compare Haavelmorsquos structural framework of causality with the framework of Directed AcyclicGraphs (DAG) studied in the Bayesian network

3 Although Uj is typically called ldquounobservablerdquo it is better to view its elements as variables thatwill be omitted from empirical analysis This may be because they are unobservable it may also bebecause the researcher has purposefully or inadvertantly neglected them

4 For notational simplicity we suppress the observation subscript i5 Using our notation Holland considers the case where D perp (XU ) which implies our condi-

tional exogeneity assumption D perp U | X below6 See eg Altonji and Matzkin (2005) Hoderlein and Mammen (2007 2009) Imbens and Newey

(2009) White and Lu (2011) and White and Chalak (2013)7 AT T = E(Y (1)minusY (0) | D = 1) = E(U minus0 | D = 1) = E(U ) = 0

8 The conditional density fg(d | x) can also be interpreted as Imbensrsquos (2000) generalized propen-sity score

9 There is a recent literature which allows time-invariant unobservables to be arbitrarily correlatedwith causes of interest in the general nonseparable models (see eg Evdokimov 2010) We leave theproblem of linking G-causality and structural causality in this general setting for future research

REFERENCES

Altonji J amp R Matzkin (2005) Cross section and panel data estimators for nonseparable models withendogenous regressors Econometrica 73 1053ndash1102

Angrist J K Graddy amp G Imbens (2000) The interpretation of instrumental variables estimatorsin simultaneous equations models with an application to the demand for fish Review of EconomicStudies 67 499ndash527

Angrist J O Jorda amp G Kuersteiner (2013) Semiparametric estimates of monetary policy effectsstring theory revisited NBER Working Paper 19355

Angrist J amp G Kuersteiner (2004) Semiparametric causality tests using the policy propensity scoreNBER Working Paper No 10975

Angrist J amp G Kuersteiner (2011) Causal effects of monetary shocks semiparametric conditionalindependence tests with a multinomial propensity score Review of Economics and Statistics 93725ndash747

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

286 XUN LU ET AL

Angrist J amp J Pischke (2009) Mostly Harmless Econometrics An Empiricistrsquos Companion PrincetonUniversity Press

Bester CA amp CB Hansen (2016) Grouped effects estimators in fixed effects models Journal ofEconometrics 190 197ndash208

Bonhomme S amp E Manresa (2015) Grouped patterns of heterogeneity in panel data Econometrica83 1147ndash1184

Browning M amp JM Carro (2007) Heterogeneity and microeconometrics modelling In R BlundellW K Newey and T Persson (Eds) Advances in Economics and Econometrics Theory andApplications Ninth World Congress of the Econometric Society Volume 3 pp 45ndash74 CambridgeUniversity Press

Browning M amp JM Carro (2010) Heterogeneity in dynamic discrete choice models EconometricsJournal 13 1ndash39

Burda M M Harding amp J Hausman (2015) A Bayesian semi-parametric competing risk model withunobserved heterogeneity Journal of Applied Econometrics 30 353ndash376

Chalak K amp H White (2011) An extended class of instrumental variables for the estimation of causaleffects Canadian Journal of Economics 44 1ndash51

Chalak K amp H White (2012) Causality conditional independence and graphical separation insettable systems Neural Computation 24 1611ndash1668

Chamberlain G (1982) The general equivalence of Granger and Sims causality Econometrica 50569ndash581

Chamberlain G (1984) Panel data In Z Griliches amp M Intriligator (eds) Handbook of Economet-rics vol2 ch 22 pp1247ndash1318 Elsevier

Chesher A (2003) Identification in nonseparable models Econometrica 71 1405ndash1441Chesher A (2005) Nonparametric identification under discrete variation Econometrica 73

1525ndash1550Dawid AP (1979) Conditional independence in statistical theory Journal of the Royal Statistical

Society Series B 41 1ndash31Deb P amp PK Trivedi (2013) Finite mixture for panels with fixed effects Journal of Econometric

Methods 2 35ndash51Delgado M amp W Gonzalez-Manteiga (2001) Significance testing in nonparametric regression based

on the bootstrap Annals of Statistics 29 1469ndash1507Dumitrescu E-I amp C Hurlin (2012) Testing for Granger non-causality in heterogeneous panels

Economic Modelling 29 1450ndash1460Evdokimov K (2010) Identification and estimation of a nonparametric panel data model with unob-

served heterogeneity Working paper Department of Economics Princeton UniversityFlorens JP (2003) Some technical issues in defining causality Journal of Econometrics 112

127ndash128Florens JP amp D Fougere (1996) Non-causality in continuous time Econometrica 64 1195ndash1212Florens JP amp M Mouchart (1982) A note on non-causality Econometrica 50 583ndash591Granger CWJ (1969) Investigating causal relations by econometric models and cross-spectral

methods Econometrica 37 424ndash438Granger CWJ (1980) Testing for causality a personal viewpoint Journal of Economic Dynamics

and Control 2 329ndash352Granger CWJ (1988) Some recent developments in a concept of causality Journal of Econometrics

39 199ndash211Granger CWJ amp P Newbold (1986) Forecasting Economic Time Series Academic Press 2nd

editionHausman J amp T Woutersen (2014) Estimating a semi-parametric duration model without specifying

heterogeneity Journal of Econometrics 178 114ndash131Haavelmo T (1943) The statistical implications of a system of simultaneous equations Econometrica

11 1ndash12Haavelmo T (1944) The probability approach in econometrics Econometrica 12 (supp) iii-vi

1ndash115

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 287

Hahn J (1998) On the role of the propensity score in efficient semiparametric estimation of averagetreatment effect Econometrica 66 315ndash331

Hahn J amp R Moon (2010) Panel data models with finite number of multiple equilibria EconometricTheory 26 863ndash881

Heckman JJ (2000) Causal parameters and policy analysis in economics a twentieth century retro-spective Quarterly Journal of Economics 115 45ndash97

Heckman JJ (2008) Econometric causality International Statistical Review 76 1ndash27Heckman JJ amp R Pinto (2015) Causal analysis after Haavelmo Econometric Theory 31

115ndash151Hirano K G Imbens amp G Ridder (2003) Efficient estimation of average treatment effects using the

estimated propensity score Econometrica 71 1161ndash1189Hoderlein S amp E Mammen (2007) Identification of marginal effects in nonseparable models without

monotonicity Econometrica 75 1513ndash1519Hoderlein S amp E Mammen (2009) Identification and estimation of local average derivatives in non-

separable models without monotonicity Econometrics Journal 12 1ndash25Hoderlein S amp H White (2012) Nonparametric identification in nonseparable panel data models with

generalized fixed effects Journal of Econometrics 168 300ndash314Holland P (1986) Statistics and causal inference Journal of the American Statistical Association 81

945ndash970Holtz-Eakin D W Newey amp HS Rosen (1988) Estimating vector autoregressions with panel data

Econometrica 56 1371ndash1396Hoover K (2008) Causality in economics and econometrics In S Durlauf and L Blume (eds)

The New Palgrave Dictionary of Economics 2nd edition Palgrave MacmillanHsiao C (2014) Analysis of Panel Data Cambridge University PressHuang M Y Sun amp H White (2016) A flexible nonparametric test for conditional independence

Econometric Theory forthcomingHurwicz L (1950) Generalization of the concept of identification In TC Koopmans (eds) Statistical

Inference in Dynamic Economic Models pp 238ndash257 John WileyIm K H Pesaran amp Y Shin (2003) Testing for unit roots in heterogeneous panels Journal of Econo-

metrics 115 53ndash74Imbens G (2000) The role of the propensity score in estimating dose-response functions Biometrika

87 706ndash710Imbens G (2004) Nonparametric estimation of average treatment effects under exogeneity a review

Review of Economics and Statistics 86 4ndash29Imbens G amp W Newey (2009) Identification and estimation of triangular simultaneous equations

models without additivity Econometrica 77 1481ndash1512Imbens G amp J Wooldridge (2009) Recent developments in the econometrics of program evaluation

Journal of Economic Literature 47 5ndash86Kasy M (2011) Identification in triangular systems using control functions Econometric Theory 27

663ndash671Koopmans T (1950) Statistical Inference in Dynamic Economic Models Cowles Commission Mono-

graph No 10 WileyKuersteiner G (2008) GrangerndashSims causality In S Durlauf and L Blume (Eds) The New Palgrave

Dictionary of Economics 2nd edition Palgrave MacmillanLechner M (2001) Identification and estimation of causal effect of multiple treatments under con-

ditional independence assumption In M Lechner F Pfeiller (eds) Econometric Evaluations ofLabour Market Policies Hildelberg Physical 43ndash58

Lechner M (2011) The relation of different concepts of causality used in time series and microecono-metrics Econometric Reviews 30 109ndash127

Lin C-C amp S Ng (2012) Estimation of panel data models with parameter heterogeneity when groupmembership is unknown Journal of Econometric Methods 1 42ndash55

Lindsay B (1995) Mixture Models Theory Geometry and Applications Hayward Institute for Math-ematical Statistics

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

288 XUN LU ET AL

Linton O amp P Gozalo (2014) Testing conditional independence restrictions Econometric Reviews33 523ndash552

Lu X amp L Su (2014) Determining the number of groups in latent panel structures Working paperSchool of Economics Singapore Management University

Matzkin R (2003) Nonparametric estimation of nonadditive random functions Econometrica 711339ndash1375

Matzkin R (2007) Nonparametric identification In JJ Heckman and EE Leamer (eds) The Hand-book of Econometrics 6B North-Holland

Mclachlan G amp K Basford (1988) Mixture Models Inference and Applications to Clustering MarcelDekker

Nair-Reichert U amp D Weinhold (2001) Causality tests for cross-country panels a new look at FDIand economic growth in developing countries Oxford Bulletin of Economics and Statistics 63153-171

Pearl J (2009) Causality Models Reasoning and Inference (2nd edition) Cambridge UniversityPress

Pearl J (2015) Trygve Haavelmo and the emergence of causal calculus Econometric Theory 31152ndash179

Rosenbaum P amp D Rubin (1983) The central role of the propensity score in observational studies forcausal effects Biometrika 70 41ndash55

Rubin D (1974) Estimating causal effects of treatments in randomized and non-randomized studiesJournal of Educational Psychology 66 688ndash701

Rubin D (2004) Direct and indirect causal effects via potential outcomes Scandinavian Journal ofStatistics 31 161ndash170

Sarafidis V amp N Weber (2015) A partially heterogeneous framework for analyzing panel data OxfordBulletin of Economics and Statistics 77 274ndash296

Sims C (1972) Money income and causality American Economic Review 62 540ndash552Song K (2009) Testing conditional independence via Rosenblatt transforms Annals of Statistics 37

4011ndash4015Strotz R amp H Wold (1960) Recursive vs nonrecursive systems An attempt at synthesis Economet-

rica 28 417ndash427Su L Z Shi amp PCB Phillips (2014) Identifying latent structures in panel data Working paper

Department of Economics Yale UniversitySu L amp H White (2007) A Consistent characteristic function-based test for conditional indepen-

dence Journal of Econometrics 141 807ndash834Su L amp H White (2008) A nonparametric Hellinger metric test for conditional independence Econo-

metric Theory 24 829ndash864Su L amp H White (2014) Testing conditional independence via empirical likelihood Journal of

Econometrics 182 27ndash44Sun Y (2005) Estimation and inference in panel structure models Working paper Department of

Economics UCSDWhite H amp K Chalak (2009) Settable systems An extension of Pearlrsquos causal model with optimiza-

tion equilibrium and learning Journal of Machine Learning Research 10 1759ndash1799White H amp K Chalak (2010) Testing a conditional form of exogeneity Economics Letters 109

88ndash90White H amp K Chalak (2013) Identification and identification failure for treatment effects using

structural systems Econometric Reviews 32 273ndash317White H K Chalak amp X Lu (2011) Linking Granger causality and the Pearl causal model with

settable systems Journal of Machine Learning Research Workshop and Conference Proceedings12 1ndash29

White H amp P Kennedy (2009) Retrospective estimation of causal effects through time In J Castleamp N Shephard (eds) The Methodology and Practice of Econometrics A Festschrift in Honour ofDavid F Hendry pp 59ndash87 Oxford University Press

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 289

White H amp X Lu (2010) Granger causality and dynamic structural systems Journal of FinancialEconometrics 8 193ndash243

White H amp X Lu (2011) Causal diagrams for treatment effect estimation with application to selectionof efficient covariates Review of Economics and Statistics 93 1453ndash1459

Wooldridge J (2005) Simple solutions to the initial conditions problem for dynamic nonlinear paneldata models with unobserved heterogeneity Journal of Applied Econometrics 20 39ndash54

Zellner A (1979) Causality and econometrics In K Brunner and AH Meltzer (eds) Three Aspectsof Policy and Policymaking Carnegie-Rochester Conference Series Vol 10 North-Holland 9ndash54

APPENDIX

Proof of Theorem 31 To show D perp U | X we establish that f (u | d x) = f (u | x)for all (ud x) We have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

= f (u x)

f (x)= f (u | x)

where the first equality in the second line holds since A2(i) (ie Dg | Xg sim D | X)implies fg(d x) = f (d | x) fg(x) say and the second holds by A2(i i) fg(u | d x) =fg(u | x) n

Proof of Theorem 32 Note that

f (u | x) = f (u x)

f (x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

We also have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | d x) fg(x)sum

gisinG pg fg(x)

The first equality in the second line follows from A2(i) Then f (u | d x) = f (u | x) forall (ud x) if and only ifsumgisinG

pg fg(u | d x) fg(x) =sumgisinG

pg fg(u | x) fg(x) for all (ud x)

By assumption (Ug | Dg Xg) g isinG is regular It follows immediately that for all g isinGDg perp Ug | Xg n

Proof of Proposition 33 Take any g isin G and let j belong to group Jg Dg rArrSYg means that Yj = r(Zj Uj bj ) Thus Uj perp Dj | X j implies that Yj perp Dj | X j byDawid (1979 Lemmas 41 and 42) This holds for all g isin G Applying Theorem 31gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

290 XUN LU ET AL

Proof of Theorem 34 We prove this by showing that Y perp D | X implies that Dg rArrSYg for all g isinG Given the assumed regularity Theorem 32 ensures that Y perp D | X impliesthat Yg perp Dg | Xg for all g isin G

Given (i) eq(33) and A2(ii) imply that for all j in group JgE

(Yj |X j = x Dj = d

) = r1(d z bj

)+ E[r2

(zUj bj

) | Dj = d X j = x]

= r1(d z bj

)+ E[r2

(zUj bj

) | X j = x]

Clearly r1(d z bj

)is constant in d if and only if E

(Yj |Dj = d X j = x

)is constant in

d Yj perp Dj | X j implies that E(Yj |Dj = d X j = x

)is constant in d which then implies

that r1(d z bj

)is constant in d Thus Dj rArrS Yj so Dg rArrS Yg

Given (i i) let ky = 1 without loss of generality Then by A2(ii) for all j in group JgFUj (u | x) = E

[1Uj le u

∣∣ X j = x]

= E[

1r(d zUj bj

) le r(d zu bj

)∣∣ X j = x]

= E[

1r(Dj Zj Uj bj

) le r(d zu bj

)∣∣ Dj = d X j = x]

= E[

1Yj le r

(d zu bj

)∣∣ Dj = d X j = x]

= Fj(r(d zu bj

) | d x)

By strict monotonicity of Fj (y | d x) in y

r(d zu bj

) = Fminus1j

(FUj (u | x) | d x

)

Now Yj perp Dj | X j implies that Fminus1j

(FUj (u | x) | d x

)is constant in d so r

(d zu bj

)is

also a constant function in d ie Dj rArrS Yj so Dg rArrS Yg As either (i) or (i i) holds for each g isin G we have Dg rArrS Yg for all g isin G n

Proof of Theorem 35 To show that A1 - A2 and the assumed regularity for Ygtogether with Dg rArrS(Xg) Yg for some g isin G imply D perp Y | X we let j in group Jgg isin G be such that Dj rArrS(X j ) Yj A2(i i) ensures that Dj perp Uj | X j so

P[Yj le y | Dj = d X j = x] =int

1r(d zu bj ) le y

d Fj (u | d x)

=int

1r(d zu bj ) le y

d Fj (u | x)

By assumption Dj rArrS(X j ) Yj so there exists ylowast isin supp(Yj ) such that there is no mea-

surable mapping f jylowast for whichint

1r(d zu bj ) le ylowastd Fj (u | d x) = f jylowast(x) ae-x Specifically this rules out the possibility thatint

1r(d zu bj ) le ylowast

d Fj (u | X j ) = f jylowast(X j ) equiv P[Yj le ylowast | X j ] as

Thus there exists ylowast isin supp(Yj ) such that

P[Yj le ylowast | Dj = d X j = x

] = P[Yj le ylowast | X j = x

]

for x in a set of positive probability so Yj perp Dj | X j Given the regularity assumed for YgTheorem 32 (for Yg) gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 291

Proof of Theorem 41 The proof is analogous to that of Theorem 31 n

Proof of Proposition 42 The proof is analogous to that of Proposition 33 n

Proof of Proposition 43 Under B3 Dgt rArrS Ygt and Dgtminus1 rArrS Ygtminus1 forallg isinGt cupGtminus1 imply that for j isin Jg there exist two measurable functions r1 and r2 such that

Yjt = r1(Uj0 bj0)+ r2(Y jtminus1 Zjt Ujt bjt

) j = 1 N t = 1

Hence

Yjt minusYjtminus1 = r2(Y jtminus1 Zjt Ujt bjt

)minus r2(Y jtminus2 Zjtminus1Ujtminus1 bjtminus1

)

At the sample level this means that

Yt minusYtminus1 = r2(Y tminus1 Zt Ut Bt

)minus r2(Y tminus2 Ztminus1Utminus1 Btminus1

)

Then B4 implies that

Dt perp (Xt Y tminus1Ut Utminus1 Bt Btminus1) | Xt Y tminus1

which further implies that Dt perp (Yt minusYtminus1) | Xt Y tminus1by Dawid (1979 Lemmas 41 and42) as (Yt minus Ytminus1) is a function of (Xt Y tminus1Ut Utminus1 Bt Btminus1) Then we have Dt perpYt | Xt Y tminus1 as Ytminus1 is a component of Y tminus1 n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

Page 11: 33 GRANGER CAUSALITY AND STRUCTURAL CAUSALITY IN … Lu_Su... · 2017. 2. 7. · 264 XUNLUETAL. natural experiments. The goal of this paper is to establish the analogous equiv-alence

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 273

Typically we do not observe an entire population Instead we observe observa-tions sampled from the population in some way For simplicity and concretenesswe consider simple random sampling At the sample level we write4

Y = r (D Z U B)

where B is the randomly sampled bj Note that the randomness of B arises solelyfrom the sampling process We write X = (Z primeW prime)prime and distinguish X from thecauses D by calling X a vector of covariates

For any random vector X we let f (x) and F (x) denote the joint prob-ability density function (PDF) and cumulative distribution function (CDF)respectively For any two random vectors X and Y we use f (y|x) and F (y|x)to denote the conditional PDF and CDF of Y given X = x respectively Wealso use subscript j and g to denote individual j and group g respectivelyFor example Fg(ud x) and fg (u|d x) denotes the CDF of (Ug Dg Xg)and the conditional PDF of Ug given Dg Xg for group g respectively Notethat Fg(ud x) = Nminus1

g

sumjisinJg Fj (ud x) which defines the mixture CDF of

(Ug Dg Xg)

32 Gminuscausality a first encounterWe now define Gminuscausality for cross-section data based on Grangerrsquos philoso-phy of nonpredictability Holland (1986 p 957) considers a special case whereD is randomized and applies Gminuscausality to cross-section data5 As in Hollandwe define Gminusnon-causality as a conditional independent statement FollowingDawid (1979) we write X perp Y | Z to denote that X and Y are independentgiven Z and X perp Y | Z if X and Y are not independent given Z Translat-ing Granger and Newboldrsquos (1986 p 221) definition to the cross-section contextgives

DEFINITION 33 (GminusCausality) Let Y Q and S be random vectors IfQ perp Y | S then Q does not Gminuscause Y with respect to (wrt) S OtherwiseQ Gminuscauses Y wrt S

Here Y Q and S can be any random vectors and this notion has no structuralcontent When S is a constant we have the simplest form of Gminusnon-causalityindependence Correlation is an example of Gminuscausality

To give Gminuscausality structural meaning we impose A1 assume that D obeysa suitable exogeneity condition and take Q = D and S = X The exogeneityassumed for D ensures identification for various measures of its effects and canoften be structurally justified It also turns out to be ideally suited to endowingGminuscausality with structural meaning

To see how this works in a simple setting consider the homogeneous case withD being a binary scalar in which average treatment effects (AT E) and averagetreatment effects on the treated (AT T ) are often discussed (see eg Rubin 1974

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

274 XUN LU ET AL

Hahn 1998 Hirano Imbens and Ridder 2003 and Angrist and Pischke 2009)Define the covariate-conditioned effect of treatment on the treated as

AT TX = E (Y (1)minusY (0) | D = 1 X)

where Y (1) is the potential response to treatment and Y (0) is the potentialresponse in the absence of treatment Using A1 we write Y (1) = r(1 Z U b0)and Y (0) = r(0 Z U b0) AT TX thus has a clear structural interpretation Fromthis we can construct AT T

AT T = E (Y (1)minusY (0) | D = 1) = E(AT TX | D = 1)

Note that here we define AT T and AT TX based on our structural equation r Nevertheless they can be defined without a structural model ie based on thetwo potential outcomes Y (0) and Y (1) (see eg Rubin 1974 and Angrist andPischke 2009)

To identify AT TX it suffices that D perp Y (0) | X Given A1 it suffices forthis that D perp U | X as is readily verified This conditional exogeneity is acommon identifying assumption in cross-sections6 Classical strict exogeneity((D Z) perp U ) is sufficient but not necessary for this Using D perp U | X in thesecond line below we have

AT TX = E [r(1 Z U b0) | D = 1 X ]minus E [r(0 Z U b0) | D = 1 X ]

= E [r(1 Z U b0) | D = 1 X ]minus E [r(0 Z U b0) | D = 0 X ]

= E(Y | D = 1 X)minus E(Y | D = 0 X) = μ(1 X)minusμ(0 X)

Thus AT TX can be expressed in terms of the distribution of observables (hereμ(1 X) and μ(0 X)) so this effect is identified (eg Hurwicz 1950) Simi-larly AT T is identified The identification result based on conditional exogeneityhas been extensively discussed in the treatment effect literature (see eg Rubin1974 Holland 1986 and Angrist and Pischke 2009)

To see the structural meaning for Gminuscausality observe that E(Y | D = 1 X)minusE(Y | D = 0 X) = 0 as is another way to write E(Y | D X) = E(Y | X) asThus in this context Gminusnon-causality in mean of D for Y wrt X is equivalentto AT TX = 0 as

Note that even for binary treatments Gminuscausality in mean does not tell thefull story as Gminuscausality in distribution can hold even when AT TX or AT Tvanishes For example suppose

Y = U D

with D perp U E(U ) = 0 E(U 2) lt infin and E(D2) lt infin Then D structurallycauses Y its effect on Y is U a random effect These effects may be posi-tive or negative however they average out as AT T = 07 Here and as we willshow generally Gminusnon-causality essentially has the interpretation that Y is not

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 275

structurally caused by D under the key conditional exogeneity assumption Inthis example we have Gminuscausality (D perp Y ) as E(Y 2 | D) = D2 E(U 2) TestingGminusnon-causality will detect the structural causality here in contrast to testingAT T = 0 or equivalently Gminusnon-causality in mean of D for Y

When D is nonbinary A1 together with suitable exogeneity conditionssimilarly suffices to identify certain effects that give structural meaning toGminuscausality This makes it generally possible to test for structural causality bytesting for Gminuscausality

With heterogeneity matters become somewhat more complicated In particularin the binary case μ(1 X)minusμ(0 X) is no longer AT TX Instead with suitableexogeneity (Assumption A2 below) we have that for each x μ(1 x)minusμ(0 x)recovers a blended effect

μ(1 x)minusμ(0 x) =sumg=1

ζg|x AT Tg|x

where the ζg|x rsquos are nonnegative weights adding to one and for all j isin JgAT Tg|x = E[Yj (1) minus Yj (0) | Dj = 1 X j = x] is the group-specific covariate-conditioned average effect of treatment on the treated Here Yj (1) is the poten-tial response to treatment and Yj (0) is the potential response in the absence oftreatment

33 Exogeneity and effect identification with heterogeneity

With heterogeneity an identifying exogeneity condition relevant for analyzingGminuscausality is

Assumption A2 (Heterogeneous Cross-Section Exogeneity) For all g isin G(i) Dg | Xg sim D | X and (i i) Dg perp Ug | Xg

Observe that this imposes structures for all groups namely all g isin G Althoughthis condition is not the weakest possible (see Chalak and White (2012) and thediscussion below) A2 generally suffices to identify effects and link Gminuscausalityand structural causality A2(i) restricts the allowed distributional heterogeneityfor all g isin G the conditional distributions of Dg given Xg are assumed identicalA2(i i) imposes conditional exogeneity for all groups Below we discuss thisfurther

THEOREM 31 Suppose A1 - A2(i) hold If A2(i i) holds then D perp U | X

It is also of interest to know whether the converse holds A strict conversedoes not hold as certain fortuitous cancellations in population-group condi-tional distributions can yield sample conditional exogeneity without population-group conditional exogeneity Nevertheless the converse does hold under a mildregularity condition ruling out exceptional cases We introduce the followingdefinition

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

276 XUN LU ET AL

DEFINITION 34 Ug | Dg Xg g isin G is regular ifsumgisinG

pg fg(u | d x) fg(x) =sumgisinG

pg fg(u | x) fg(x) for all (ud x) (32)

implies Ug perp Dg | Xg for all g isin GWhen G = 1 Ug | Dg Xg g isin G is necessarily regular To understand whatregularity rules out consider the next simplest case with G = 2 and Xg absentThen Ug | Dg g isin G fails to be regular if and only if U2 perp D2 (say) and

f1(u | d) = f1(u)minus 1minus p1

p1[ f2(u | d)minus f2(u)] for all (ud)

where the two groups have been re-indexed for convenience Suppose that p1f1(u) f2(u) and f2(u | d) are arbitrary Then it is easily arranged that f1(u | d)can be negative for (ud) in a set of positive probability so f1(u | d) does notdefine a conditional density and Ug | Dg g isin G is regular after all If this f1(u |d) is nevertheless a conditional density it is clearly highly special as it ensuresthat eq(32) holds for the given p1 and the functions defined by f1(u) f2(u) andf2(u | d) (all of which are typically unknown) but not necessarily otherwise

In the general case Ug | Dg Xg g isin G fails to be regular if and only ifUg perp Dg | Xg for some g isin G and

f1(u | d x) = f1(u | x)minus (p1 f1(x))minus1sumg=2

pg [ fg(u | d x) minus fg(u | x)] fg(x)

times for all (ud x)

where we again re-index the groups As before this is clearly a very special pop-ulation configuration ruling out such cases by imposing regularity is a weak re-striction

The converse result is

THEOREM 32 Suppose A1 - A2(i) hold and suppose Ug | Dg Xgg isin Gis regular If D perp U | X then A2(i i) holds

Theorems 31-32 also hold with Yg | Dg Xg g isinG regular D perp U | X replacedby D perp Y | X and A2(i i) replaced by Dg perp Yg | Xg g isin G This result plays acentral role in linking structural causality and Gminuscausality

34 Linking structural causality and GminuscausalityAs we have seen A1 imposes structures where causal effects are well definedA2 permits recovering versions of these As we now prove this gives structuralmeaning to Gminuscausality generally Recall that the bj rsquos are identical in group Jgthus the same structural causality relations hold for all j in a given group Wewrite Dg rArrS Yg when Dj rArrS Yj for (all) j in group Jg Our next result showsthat structural noncausality implies Gminusnon-causality

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 277

PROPOSITION 33 Let A1 - A2 hold Then structural noncausality(Dg rArrS Yg forallg isinG) implies that D does not Gminuscause Y wrt X ie D perp Y | X

This result is intuitive when structural effects are well defined and identifiableGminusnon-causality follows from structural noncausality Holland (1986 p 958)gives a similar result for the random experiment case Our Proposition 33 jus-tifies using tests of Gminusnoncausality to test structural noncausality if we rejectGminusnoncausality we must reject structural noncausality

As WL show structural causality does not imply Gminuscausality Without furtherassumptions the concepts are not equivalent Nevertheless if as is commonlyassumed in the literature (see WL for discussion) r obeys separability betweenobservable causes D and unobservable causes U or r obeys a specific form ofmonotonicity in U then with suitable regularity structural causality does implyGminuscausality We have

THEOREM 34 Given A1 - A2 suppose Yg | Dg Xgg isin G is regularSuppose further that for all g isin G either (i) or (i i) holds

(i) For all j isin Jg for unknown measurable functions r1 and r2

Yj = r1(Dj Zj bj

)+ r2(Zj Uj bj

) (33)

(ii) For all j isinJg for = 1 ky r(d zu bj ) = r0(d zu bj ) for scalaru where r0(d z middot bj ) is strictly monotone increasing for each admissi-ble (d z) and Fj(y | d x) = P[Yj le y | Dj = d X j = x] is strictlymonotone in y for each admissible (d x)

Then structural causality (for some g isin G Dj rArrS Yj for (all) j in Jg) impliesthat D Gminuscauses Y wrt X ie D perp Y | X

Kasy (2011) gives a discussion of structures satisfying (i i) Even without sep-arability or monotonicity there is an equivalence between Gminuscausality and astronger notion of structural causality that handles certain exceptional cases wherethe causal structure and the conditional distribution of (Uj Dj ) given X j inter-act in just the right way to hide the structural causality WL call this strongernotion structural causality with positive probability and provide discussions Welet Yj = supp(Yj ) and Xj = supp(X j )

DEFINITION 35 (Structural Causality with Positive Probability) SupposeA1 holds and let j be given Suppose that for each y isin Yj there exists a mea-surable function f jy Xj rarr [01] such thatint

1r(Dj Zj u bj

)lt y

d Fj

(u | X j

) = f jy(X j

)as (34)

where Fj (u | X j ) = E[1Uj le u | X j ] Then Dj does not structurally cause Yj

almost surely (as) wrt X j (Dj rArrS(X j ) Yj ) Otherwise Dj structurally causesYj with positive probability (wpp) wrt X j

(Dj rArrS(X j ) Yj

)

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

278 XUN LU ET AL

By the definition of a group the same structural causality wpp relations holdfor all j in a given group We thus write Dg rArrS(Xg) Yg when Dj rArrS(X j ) Yj holdsfor all j in group Jg

Below is an example in WL in which Dj structurally causes Yj while Dj doesnot structurally cause Yj as Consider the structural equation r

Yj = r(Dj Uj

) = DjradicD2

j +1Uj1 + 1radic

D2j +1

Uj2

where Uj equiv (Uj1Uj2) and Dj are all N (01) random variables and Dj U1 j andU2 j are mutually independent For simplicity there is no bj Zj or X j It is clearthat Dj structurally causes Yj here Nevertheless Dj does not structurally causeYj as To see this note that Yj and Dj are independent Thus the LHS of eq(34)becomesint

1r(Dj u1u2

)lt y

d Fj (u1u2)

=int

1r(Dj u1u2

)lt y

d Fj

(u1u2|Dj

) = Pr[Yj lt y|Dj

] = (y)

where Fj (middot) and Fj(middot|Dj

)denote the CDF of Uj and the conditional CDF of Uj

given Dj and is the standard normal CDF Thus the RHS does not depend onDj ie Dj does not structurally cause Yj as

Theorem 35 below gives a result linking Gminuscausality and structural causalityas

THEOREM 35 Suppose A1 - A2 hold Suppose also that Yg | Dg Xgg isinG is regular If Dj structurally causes Yj wpp wrt X j (for some g isin GDj rArrS(X j ) Yj for all j in Jg) then D Gminuscauses Y wrt X ie D perp Y | X

Together Proposition 33 and Theorem 35 establish that cross-sectionGminuscausality and structural causality are essentially equivalent under the key con-ditional exogeneity assumption We say ldquoessentiallyrdquo as Theorem 35 rules outthe two ways that Gminusnon-causality can mask structural causality The first arisesfrom subtle interactions between the causal structure and the conditional distribu-tion of (Uj Dj ) given X j The second arises under heterogeneity from delicatecancellations among the conditional distributions of Yj given Dj and X j acrossgroups These exceptional possibilities only trivially mitigate the power of testsfor Gminuscausality as tests for structural causality Errors of interpretation and in-ference need not result as long as these exceptions are recognized

35 Identification and identification failure

To gain further insight into the relation between Gminuscausality and effect identi-fication we now undertake a deeper analysis of Gminuscausality in mean For this

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 279

let qg(d x) = fg(d x)sumhisinG (ph fh(d x)) Under A1 the law of iterated ex-

pectations gives

E (Y | D = d X = x) =sumgisinG

pg qg(d x)

intr(d zu bg) d Fg(u | d x)

For concreteness consider identifying average marginal effects for a continu-ous treatment d To allow both discrete and continuous Ug let νg(u | d x) definea σminusfinite measure dominating that defined by Fg(u | d x) and let fg(u | d x)define the associated Radon-Nikodym density for Ug given (Dg Xg) Assum-ing differentiability and mild regularity derivations parallel to those of White andChalak (2013) give

part

partdE(Y | D = d X = x)

=sumgisinG

pg qg(d x)

intpart

partdr(d zu bg) fg(u | d x) dνg(u | d x)

+sumgisinG

pg qg(d x)

int[r(d zu bg)

part

partdln fg(u | d x)]

times fg(u | d x) dνg(u | d x)

+sumgisinG

pgpart

partdqg(d x)

intr(d zu bg) fg(u | d x)dνg(u | d x)

After some manipulation this regression derivative can be written as

part

partdE (Y | D = d X = x) = β(d x)+ δ1(d x)+ δ2(d x) (35)

The first of the three terms on the right is a weighted average marginal effect

β(d x) =sumgisinG

pg qg(d x)βg(d x) where βg(d x)

= E[(partpartd)r(Dg ZgUg bg) | Dg = d Xg = x

]is the covariate-conditioned average marginal effect of Dj on Yj at (Dj = d X j =x) for j isin Jg g isin G Thus when the other two terms vanish the regressionderivative identifies the structurally meaningful weighted effect β(d x) andwhen β(d x) is nonzero for (d x) in a set of positive probability we haveGminuscausality in mean of D for Y wrt X When the other two terms do notvanish we generally have identification failure and there is no necessary linkbetween Gminuscausality and structural causality

The second term is a pure endogeneity bias

δ1(d x) =sumgisinG

pg qg(d x) E(εg Sg | Dg = d Xg = x

)

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

280 XUN LU ET AL

where εg = Ygminus E(Yg | Dg = d Xg = x

)is the regression residual for the given

group and Sg = (partpartd) ln fg(Ug | Dg Xg) is the exogeneity score of White andChalak (2013) Observe that E(εg Sg | Dg = d Xg = x) represents a specificform of omitted variable bias where the exogeneity score is the omitted variableA sufficient (but not necessary) condition for this bias to vanish is A2(i i) Dg perpUg | Xg for all g isin G

The third term is a pure heterogeneity bias

δ2(d x) =sumgisinG

pgpart

partdqg(d x) E

(Yg | Dg = d Xg = x

)

due to heterogeneity of fg(d | x) across members of G8 A sufficient (but notnecessary) condition for this to vanish is A2(i) Dg | Xg sim D | X for all g isin G

Thus A2 ensures that the regression derivative identifies the weighted effect

β(d x) = βlowast(d x) =sumgisinG

pg qlowastg(x)βlowast

g(d x)

where qlowastg(x) = fg(x)

sumh (ph fh(x)) and βlowast

g(d x) = int(partpartd)r(d zu bg)

d Fg(u | x) This identification provides the link between Gminuscausality in meanand structural causality

Although A2 is not necessary for effect identification cases where identifica-tion holds in the absence of A2 are quite special analogous to failures of reg-ularity Thus for practical purposes A2 can be viewed as playing the key rolein identifying effects of interest and thereby linking Gminuscausality and structuralcausality

Further as Proposition 61 and Corollary 62 of WL show in the absence ofstructural causality Gminuscausality is essentially equivalent to exogeneity failureHere this is reflected in the fact that when βg(d x) vanishes for all g A1 andmild regularity conditions give

part

partdE (Y | D = d X = x) = δ1(d x)+ δ2(d x)

Thus with structural noncausality as Gminuscausality in mean implies the fail-ure of A2(i) A2(i i) or both conversely failure of A2 essentially ensuresGminuscausality in mean

36 Testing for Gminuscausality and structural causality in cross-sectiondata

As just seen the hypothesis of Gminusnon-causality and thus of structural noncausal-ity under the conditional exogeneity assumption is a specific conditional inde-pendence In the literature there are many conditional independence tests thatapply to IID data (eg see Delgado and Gonzalez-Manteiga 2001 Fernandes and

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 281

Flores 2001 Su and White 2007 2008 2014 Song 2009 Linton and Gozalo2014 Huang Sun and White 2016) These methods can be computationallychallenging as they are nonparametric Angrist and Kuersteiner (2004 2011)develop a semi-parametric test for conditional independence Based on the factthat Y perp D | X implies ψ1(Y ) perp ψ2(D) | X for any measurable vector functions(ψ1ψ2) WL propose several convenient regression-based tests

4 STRUCTURAL CAUSALITY AND GminusCAUSALITY IN PANEL DATAThe literature contains considerable discussion about Gminuscausality in the paneldata setting Nevertheless the focus is mainly on linear or parametric modelsFor example Chamberlain (1984) discusses Gminuscausality ldquoconditional on un-observables [lsquofixed effectrsquo]rdquo in a linear model and a logit model Holtz-EakinNewey and Rosen (1988) consider a traditional linear vector autoregression(VAR) and use GMM to test Gminuscausality Nair-Reichert and Weinhold (2001)discuss Gminuscausality in a dynamic mixed fixed- and random-coefficients linearmodel allowing heterogeneity across individuals Dumitrescu and Hurlin (2012)test for Gminuscausality in a linear dynamic panel model with fixed coefficients thatvary across individuals There is also a growing literature on nonlinear and non-separable panel models (see eg Hoderlein and White (2012) and the referencestherein) We are not aware that Gminuscausality has been discussed in such modelsso far

We emphasize that in panel data Gminuscausality is also entirely based on pre-dictability and has no causal interpretation In this section we link Gminuscausalityand structural causality in general panel structures Since the static panel case issimilar to the cross-sectional case we focus on a dynamic data generating processTo simplify notation for given y isin N+ we let Y jtminus1 denote the finite historyY jtminus1 = (Yjtminusy Yjtminus1)

Assumption B1 (Panel Population DGP) Let (F P) be a complete prob-ability space let N isin N+ and let y isin N+ For the members of a populationj isin 1 N let the random vectors Yjt t = 12 be structurally determinedby a triangular system as

Yjt = r(Y jtminus1 Djt Zjt Ujt bjt

) (41)

where r is an unknown measurable ky times 1 function ky isin N+ Yjτ (τ = 1 minusy 0) Djt (kd times1kd isin N+) Zjt (kz times1 kz isin N) and Ujt (ku times1 ku isin N)are vectors of nondegenerate random variables on (F P) and bjt is a non-random real vector of dimension kb isin N Suppose also that Wjt t = 12 arerandom vectors on (F P) with dimension kw isin N The triangular structureis such that for t = 12 neither Yjsinfins=1minusy

nor Djsinfins=1 structurally deter-mine Wjt Yjsinfins=1minusy

does not structurally determine Djt Zjt or Ujt andDjsinfins=1 does not structurally determine Zjt or Ujt

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

282 XUN LU ET AL

The interpretation of structural eq(41) is the same as that for eq(31) ex-cept that here we have multiple time periods t = 12 The elements of(Yjt Djt Zjt Wjt Ujt

)and bjt rsquos can contain both time-varying and time-

invariant elements For now we do not distinguish these The elements of(Djt Zjt Wjt Ujt ) can contain lags of underlying variables By permittingonly finite histories Y jtminus1 Djt Zjt we restrict attention to Markov-type datagenerating processes this greatly simplifies the analysis and corresponds to thestructures usually considered in practice

As in the cross-section case we can have both structural heterogeneity anddistributional heterogeneity We specify the relevant structural and distributionalheterogeneities below

We again impose random sampling from the population For simplicity weassume all time periods can be observed for every individual so we observe abalanced panel Similar to the cross-section case we write

Yt = r (Y tminus1 Dt Zt Ut Bt ) t = 1 T

by suppressing the cross-sectional subscript i We assume that we observe data(Yt Dt Zt Wt ) t = 1 T and relevant lags prior to t = 1

41 Linking structural causality and Gminuscausality in dynamic panelsWe first define structural causality for the dynamic panel structure Let Djt be thesupport of Djt

DEFINITION 41 (Structural Causality in Dynamic Panels) Let j and t begiven If the function r(y jtminus1 middot zjt ujt bjt ) Djt rarr R

ky is constant on Djt

for all admissible y jtminus1 zjt and ujt then Djt does not structurally cause Yjt and we write Djt rArrS Yjt Otherwise Djt structurally causes Yjt and we writeDjt rArrS Yjt

We let X jt = (Z primejt W prime

jt )prime denote the covariates at time t WL consider

retrospective conditional exogeneity and retrospective Gminuscausality (White andKennedy 2009) We also allow this For this we letX jt denote a covariate his-tory that may contain X jt and lags or leads of X jt as in Wooldridge (2005 p 41)or WL ie for some m isin N+

X jt = SX(X jtminusm X jt X jt+m

) t = 12

where SX is a given selection matrix We assume

X jt X jtminusm subeX jt and

allow X jt also to contain leads of X jt When Xjt contains leads we have theretrospective case as in WL We denote randomly sampled covariates X t Forsimplicity we also assume that all needed elements of X t are observable Tem-poral precedence plays an important role in panel data In particular comparedwith the cross-sectional case we now have a richer set of covariates (includingboth leads and lags) to choose from

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 283

To provide a link between structural causality and Gminuscausality for dynamicpanels we condition on not only covariates X t but also lags of Yt For thiswe denote Y jtminus1 = (Yjtminusmy Yjtminus1) where my isin N+ and my ge y ie

Y jtminus1 sube Y jtminus1 Similarly we let Y tminus1 be the randomly sampled Y jtminus1Similar to the cross-section case for each t we define population group Jgt

g isin 1 t as a collection of population units having identical bjt and iden-tical distribution of (Ujt Djt ) |X jt Y jtminus1 We also let Gt = 1 t Forgiven t we write Dgt rArrS Ygt to denote Djt rArrS Yjt for all j in group Jgt g isin Gt

Next we impose an exogeneity assumption analogous to A2

Assumption B2 (Heterogeneous Dynamic Panel Exogeneity) Let t be givenFor all g isin Gt (i) Dgt |Xgt Y gtminus1 sim Dt |X t Y tminus1 and (i i) Dgt perp Ugt |Xgt Y gtminus1

Here as before for all g isin Gt the conditional distributions of Dgt given(Xgt Y gtminus1) are assumed to be identical and we write the common distributionas Dt |X t Y tminus1

The analog of Theorem 31 relating population and sample conditional exo-geneity is

THEOREM 41 Suppose B1ndashB2(i) hold and let t be given If B2(i i) holdsthen Dt perp Ut |X t Y tminus1

Proposition 42 provides a link between structural noncausality and Gminusnon-causality

PROPOSITION 42 Suppose that B1 and B2 hold Let t be given Thenstructural noncausality (Dgt rArrS Ygt forallg isin Gt ) implies that Dt does notGminuscause Yt wrtX t Y tminus1 ie Dt perp Yt |X t Y tminus1

We can also define structural causality with positive probability for dynamicstructures and show that structural causality and Gminuscausality are ldquoessentiallyrdquoequivalent under the key conditional exogeneity assumption For brevity we omitthe details

So far we have not carefully distinguished between time-varying and time-invariant elements of unobservable bjt and Ujt Below we briefly discuss thecase where we allow time-invariant components of Ujt and the randomly sampledbjt to be arbitrarily dependent or correlated with the cause of interest Djt For this we denote Ujt equiv (Ujt Uj0) where Ujt is time varying and Uj0time-invariant Similarly we denote bjt equiv (bjt bj0) The randomly sampledUjt Uj0 bjt bj0 are denoted as Ut U0 Bt B0 respectively In princi-ple we want to remove the time-invariant Uj0 and bj0 using eg the firstdifference For this it is convenient to impose a separable assumption on thestructural equation9 We also impose a conditional exogeneity assumption at thesample level

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

284 XUN LU ET AL

Assumption B3 Suppose that B1 holds with

Yjt = r1(Djt Uj0 bj0

)+r2(Y jtminus1 Djt Zjt Ujt bjt

) j = 1 N t = 1

where r1 and r2 are two unknown ky times1 measurable functions

Assumption B4 Suppose that for given t Dt perp (Ut Utminus1 Bt Btminus1

) |Xt Y tminus1

B4 is a sample-level assumption which can be supported by a correspond-ing assumption at the heterogeneous population level Note that in general B4is a strong assumption as we condition on Y tminus1 which is a function of bothDtminus1 and Utminus1 Nevertheless this assumption is plausible under the null of struc-tural noncausality as Y jtminus1 is a constant function of Djtminus1 under the nullOne simple example is that bjt rsquos are constants over j and t and Djt infint=1 perpYjt 0

t=1minusyUjt infint=1X jt infint=1 Under the null of structural noncausality for

all t Y jtminus1 is a function of Yjs0s=1minusy

Ujsts=1X jst

s=1 Then it is easyto show that in this case B4 is satisfied Certainly it will be interesting to relaxB4 but we leave this for future research

PROPOSITION 43 Suppose that B1ndashB4 hold Suppose that Dgt rArrS Ygtand Dgtminus1 rArrS Ygtminus1 forallg isin Gt cupGtminus1 Then Dt perp Yt | Xt Y tminus1

Proposition 43 suggests that when there are time-invariant components in theunobservables we can test for structural noncausality by testing for Gminusnon-causality Dt perp Yt | Xt Y tminus1 under the conditional exogeneity assumption B4

42 Testing for Gminuscausality and structural causality in panel dataAs shown above we can perform a Gminusnon causality test to test for structuralnoncausality Testing Gminusnon-causality in panel data is also simply a conditionalindependence test Here we focus on testing Dt perp Yt | Xt Y tminus1

First assume that the joint distributions of (Dt Yt Xt ) are identical over t In this case we can pool the data and implement a conditional independence testas discussed in Section 36

Second suppose that the joint distributions of (Dt Yt Xt ) are different overtime t In this case for each time period t using the cross-section data we canimplement a conditional independence test for Dt perp Yt | (Xt Y tminus1) and obtain atest statistic sayWt Thus we have T test statistics W1W2 WT and eachone can be used to test for structural noncausality for each t We may also want totest the hypothesis that for all t Djt does not structurally cause Yjt For this wecan construct our test statistic by taking the average of the T test statistics ieW equiv 1

T

sumTt=1Wt as in Im Pesaran and Shin (2003) and Dumitrescu and Hurlin

(2012) Other forms of ldquoaveragesrdquo are possible and special care is needed to takeinto account the dependence structure over time

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 285

5 CONCLUSION

This paper provides direct links between Granger causality and structural causal-ity in cross-section and panel data We extend Granger causality to cross-section and panel data and give a natural definition of structural causality inheterogeneous populations We show that under the key conditional exogeneityassumption Granger causality is essentially equivalent to structural causality incross-section and panel data Similar to the results in White and Lu (2010) fortime-series data our results here should enable researchers to avoid the misuse ofGranger causality and to establish the desired structural causal relation

NOTES

1 Strictly speaking if the conditional distributions are parameterized or treated as infinite-dimensional parameters then distributional heterogeneity may also be thought of as a special caseof structural heterogeneity

2 The classification is certainly overly simplistic We ignore the early literature on the philosophyof causality (for a review see eg Holland 1986 and Hoover 2008) We also ignore the litera-ture on causality discussed in other disciplines notably in machine learning (see eg Pearl 20092015 White and Chalak 2009 and White Chalak and Lu 2011) In particular Heckman and Pinto(2015) compare Haavelmorsquos structural framework of causality with the framework of Directed AcyclicGraphs (DAG) studied in the Bayesian network

3 Although Uj is typically called ldquounobservablerdquo it is better to view its elements as variables thatwill be omitted from empirical analysis This may be because they are unobservable it may also bebecause the researcher has purposefully or inadvertantly neglected them

4 For notational simplicity we suppress the observation subscript i5 Using our notation Holland considers the case where D perp (XU ) which implies our condi-

tional exogeneity assumption D perp U | X below6 See eg Altonji and Matzkin (2005) Hoderlein and Mammen (2007 2009) Imbens and Newey

(2009) White and Lu (2011) and White and Chalak (2013)7 AT T = E(Y (1)minusY (0) | D = 1) = E(U minus0 | D = 1) = E(U ) = 0

8 The conditional density fg(d | x) can also be interpreted as Imbensrsquos (2000) generalized propen-sity score

9 There is a recent literature which allows time-invariant unobservables to be arbitrarily correlatedwith causes of interest in the general nonseparable models (see eg Evdokimov 2010) We leave theproblem of linking G-causality and structural causality in this general setting for future research

REFERENCES

Altonji J amp R Matzkin (2005) Cross section and panel data estimators for nonseparable models withendogenous regressors Econometrica 73 1053ndash1102

Angrist J K Graddy amp G Imbens (2000) The interpretation of instrumental variables estimatorsin simultaneous equations models with an application to the demand for fish Review of EconomicStudies 67 499ndash527

Angrist J O Jorda amp G Kuersteiner (2013) Semiparametric estimates of monetary policy effectsstring theory revisited NBER Working Paper 19355

Angrist J amp G Kuersteiner (2004) Semiparametric causality tests using the policy propensity scoreNBER Working Paper No 10975

Angrist J amp G Kuersteiner (2011) Causal effects of monetary shocks semiparametric conditionalindependence tests with a multinomial propensity score Review of Economics and Statistics 93725ndash747

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

286 XUN LU ET AL

Angrist J amp J Pischke (2009) Mostly Harmless Econometrics An Empiricistrsquos Companion PrincetonUniversity Press

Bester CA amp CB Hansen (2016) Grouped effects estimators in fixed effects models Journal ofEconometrics 190 197ndash208

Bonhomme S amp E Manresa (2015) Grouped patterns of heterogeneity in panel data Econometrica83 1147ndash1184

Browning M amp JM Carro (2007) Heterogeneity and microeconometrics modelling In R BlundellW K Newey and T Persson (Eds) Advances in Economics and Econometrics Theory andApplications Ninth World Congress of the Econometric Society Volume 3 pp 45ndash74 CambridgeUniversity Press

Browning M amp JM Carro (2010) Heterogeneity in dynamic discrete choice models EconometricsJournal 13 1ndash39

Burda M M Harding amp J Hausman (2015) A Bayesian semi-parametric competing risk model withunobserved heterogeneity Journal of Applied Econometrics 30 353ndash376

Chalak K amp H White (2011) An extended class of instrumental variables for the estimation of causaleffects Canadian Journal of Economics 44 1ndash51

Chalak K amp H White (2012) Causality conditional independence and graphical separation insettable systems Neural Computation 24 1611ndash1668

Chamberlain G (1982) The general equivalence of Granger and Sims causality Econometrica 50569ndash581

Chamberlain G (1984) Panel data In Z Griliches amp M Intriligator (eds) Handbook of Economet-rics vol2 ch 22 pp1247ndash1318 Elsevier

Chesher A (2003) Identification in nonseparable models Econometrica 71 1405ndash1441Chesher A (2005) Nonparametric identification under discrete variation Econometrica 73

1525ndash1550Dawid AP (1979) Conditional independence in statistical theory Journal of the Royal Statistical

Society Series B 41 1ndash31Deb P amp PK Trivedi (2013) Finite mixture for panels with fixed effects Journal of Econometric

Methods 2 35ndash51Delgado M amp W Gonzalez-Manteiga (2001) Significance testing in nonparametric regression based

on the bootstrap Annals of Statistics 29 1469ndash1507Dumitrescu E-I amp C Hurlin (2012) Testing for Granger non-causality in heterogeneous panels

Economic Modelling 29 1450ndash1460Evdokimov K (2010) Identification and estimation of a nonparametric panel data model with unob-

served heterogeneity Working paper Department of Economics Princeton UniversityFlorens JP (2003) Some technical issues in defining causality Journal of Econometrics 112

127ndash128Florens JP amp D Fougere (1996) Non-causality in continuous time Econometrica 64 1195ndash1212Florens JP amp M Mouchart (1982) A note on non-causality Econometrica 50 583ndash591Granger CWJ (1969) Investigating causal relations by econometric models and cross-spectral

methods Econometrica 37 424ndash438Granger CWJ (1980) Testing for causality a personal viewpoint Journal of Economic Dynamics

and Control 2 329ndash352Granger CWJ (1988) Some recent developments in a concept of causality Journal of Econometrics

39 199ndash211Granger CWJ amp P Newbold (1986) Forecasting Economic Time Series Academic Press 2nd

editionHausman J amp T Woutersen (2014) Estimating a semi-parametric duration model without specifying

heterogeneity Journal of Econometrics 178 114ndash131Haavelmo T (1943) The statistical implications of a system of simultaneous equations Econometrica

11 1ndash12Haavelmo T (1944) The probability approach in econometrics Econometrica 12 (supp) iii-vi

1ndash115

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 287

Hahn J (1998) On the role of the propensity score in efficient semiparametric estimation of averagetreatment effect Econometrica 66 315ndash331

Hahn J amp R Moon (2010) Panel data models with finite number of multiple equilibria EconometricTheory 26 863ndash881

Heckman JJ (2000) Causal parameters and policy analysis in economics a twentieth century retro-spective Quarterly Journal of Economics 115 45ndash97

Heckman JJ (2008) Econometric causality International Statistical Review 76 1ndash27Heckman JJ amp R Pinto (2015) Causal analysis after Haavelmo Econometric Theory 31

115ndash151Hirano K G Imbens amp G Ridder (2003) Efficient estimation of average treatment effects using the

estimated propensity score Econometrica 71 1161ndash1189Hoderlein S amp E Mammen (2007) Identification of marginal effects in nonseparable models without

monotonicity Econometrica 75 1513ndash1519Hoderlein S amp E Mammen (2009) Identification and estimation of local average derivatives in non-

separable models without monotonicity Econometrics Journal 12 1ndash25Hoderlein S amp H White (2012) Nonparametric identification in nonseparable panel data models with

generalized fixed effects Journal of Econometrics 168 300ndash314Holland P (1986) Statistics and causal inference Journal of the American Statistical Association 81

945ndash970Holtz-Eakin D W Newey amp HS Rosen (1988) Estimating vector autoregressions with panel data

Econometrica 56 1371ndash1396Hoover K (2008) Causality in economics and econometrics In S Durlauf and L Blume (eds)

The New Palgrave Dictionary of Economics 2nd edition Palgrave MacmillanHsiao C (2014) Analysis of Panel Data Cambridge University PressHuang M Y Sun amp H White (2016) A flexible nonparametric test for conditional independence

Econometric Theory forthcomingHurwicz L (1950) Generalization of the concept of identification In TC Koopmans (eds) Statistical

Inference in Dynamic Economic Models pp 238ndash257 John WileyIm K H Pesaran amp Y Shin (2003) Testing for unit roots in heterogeneous panels Journal of Econo-

metrics 115 53ndash74Imbens G (2000) The role of the propensity score in estimating dose-response functions Biometrika

87 706ndash710Imbens G (2004) Nonparametric estimation of average treatment effects under exogeneity a review

Review of Economics and Statistics 86 4ndash29Imbens G amp W Newey (2009) Identification and estimation of triangular simultaneous equations

models without additivity Econometrica 77 1481ndash1512Imbens G amp J Wooldridge (2009) Recent developments in the econometrics of program evaluation

Journal of Economic Literature 47 5ndash86Kasy M (2011) Identification in triangular systems using control functions Econometric Theory 27

663ndash671Koopmans T (1950) Statistical Inference in Dynamic Economic Models Cowles Commission Mono-

graph No 10 WileyKuersteiner G (2008) GrangerndashSims causality In S Durlauf and L Blume (Eds) The New Palgrave

Dictionary of Economics 2nd edition Palgrave MacmillanLechner M (2001) Identification and estimation of causal effect of multiple treatments under con-

ditional independence assumption In M Lechner F Pfeiller (eds) Econometric Evaluations ofLabour Market Policies Hildelberg Physical 43ndash58

Lechner M (2011) The relation of different concepts of causality used in time series and microecono-metrics Econometric Reviews 30 109ndash127

Lin C-C amp S Ng (2012) Estimation of panel data models with parameter heterogeneity when groupmembership is unknown Journal of Econometric Methods 1 42ndash55

Lindsay B (1995) Mixture Models Theory Geometry and Applications Hayward Institute for Math-ematical Statistics

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

288 XUN LU ET AL

Linton O amp P Gozalo (2014) Testing conditional independence restrictions Econometric Reviews33 523ndash552

Lu X amp L Su (2014) Determining the number of groups in latent panel structures Working paperSchool of Economics Singapore Management University

Matzkin R (2003) Nonparametric estimation of nonadditive random functions Econometrica 711339ndash1375

Matzkin R (2007) Nonparametric identification In JJ Heckman and EE Leamer (eds) The Hand-book of Econometrics 6B North-Holland

Mclachlan G amp K Basford (1988) Mixture Models Inference and Applications to Clustering MarcelDekker

Nair-Reichert U amp D Weinhold (2001) Causality tests for cross-country panels a new look at FDIand economic growth in developing countries Oxford Bulletin of Economics and Statistics 63153-171

Pearl J (2009) Causality Models Reasoning and Inference (2nd edition) Cambridge UniversityPress

Pearl J (2015) Trygve Haavelmo and the emergence of causal calculus Econometric Theory 31152ndash179

Rosenbaum P amp D Rubin (1983) The central role of the propensity score in observational studies forcausal effects Biometrika 70 41ndash55

Rubin D (1974) Estimating causal effects of treatments in randomized and non-randomized studiesJournal of Educational Psychology 66 688ndash701

Rubin D (2004) Direct and indirect causal effects via potential outcomes Scandinavian Journal ofStatistics 31 161ndash170

Sarafidis V amp N Weber (2015) A partially heterogeneous framework for analyzing panel data OxfordBulletin of Economics and Statistics 77 274ndash296

Sims C (1972) Money income and causality American Economic Review 62 540ndash552Song K (2009) Testing conditional independence via Rosenblatt transforms Annals of Statistics 37

4011ndash4015Strotz R amp H Wold (1960) Recursive vs nonrecursive systems An attempt at synthesis Economet-

rica 28 417ndash427Su L Z Shi amp PCB Phillips (2014) Identifying latent structures in panel data Working paper

Department of Economics Yale UniversitySu L amp H White (2007) A Consistent characteristic function-based test for conditional indepen-

dence Journal of Econometrics 141 807ndash834Su L amp H White (2008) A nonparametric Hellinger metric test for conditional independence Econo-

metric Theory 24 829ndash864Su L amp H White (2014) Testing conditional independence via empirical likelihood Journal of

Econometrics 182 27ndash44Sun Y (2005) Estimation and inference in panel structure models Working paper Department of

Economics UCSDWhite H amp K Chalak (2009) Settable systems An extension of Pearlrsquos causal model with optimiza-

tion equilibrium and learning Journal of Machine Learning Research 10 1759ndash1799White H amp K Chalak (2010) Testing a conditional form of exogeneity Economics Letters 109

88ndash90White H amp K Chalak (2013) Identification and identification failure for treatment effects using

structural systems Econometric Reviews 32 273ndash317White H K Chalak amp X Lu (2011) Linking Granger causality and the Pearl causal model with

settable systems Journal of Machine Learning Research Workshop and Conference Proceedings12 1ndash29

White H amp P Kennedy (2009) Retrospective estimation of causal effects through time In J Castleamp N Shephard (eds) The Methodology and Practice of Econometrics A Festschrift in Honour ofDavid F Hendry pp 59ndash87 Oxford University Press

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 289

White H amp X Lu (2010) Granger causality and dynamic structural systems Journal of FinancialEconometrics 8 193ndash243

White H amp X Lu (2011) Causal diagrams for treatment effect estimation with application to selectionof efficient covariates Review of Economics and Statistics 93 1453ndash1459

Wooldridge J (2005) Simple solutions to the initial conditions problem for dynamic nonlinear paneldata models with unobserved heterogeneity Journal of Applied Econometrics 20 39ndash54

Zellner A (1979) Causality and econometrics In K Brunner and AH Meltzer (eds) Three Aspectsof Policy and Policymaking Carnegie-Rochester Conference Series Vol 10 North-Holland 9ndash54

APPENDIX

Proof of Theorem 31 To show D perp U | X we establish that f (u | d x) = f (u | x)for all (ud x) We have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

= f (u x)

f (x)= f (u | x)

where the first equality in the second line holds since A2(i) (ie Dg | Xg sim D | X)implies fg(d x) = f (d | x) fg(x) say and the second holds by A2(i i) fg(u | d x) =fg(u | x) n

Proof of Theorem 32 Note that

f (u | x) = f (u x)

f (x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

We also have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | d x) fg(x)sum

gisinG pg fg(x)

The first equality in the second line follows from A2(i) Then f (u | d x) = f (u | x) forall (ud x) if and only ifsumgisinG

pg fg(u | d x) fg(x) =sumgisinG

pg fg(u | x) fg(x) for all (ud x)

By assumption (Ug | Dg Xg) g isinG is regular It follows immediately that for all g isinGDg perp Ug | Xg n

Proof of Proposition 33 Take any g isin G and let j belong to group Jg Dg rArrSYg means that Yj = r(Zj Uj bj ) Thus Uj perp Dj | X j implies that Yj perp Dj | X j byDawid (1979 Lemmas 41 and 42) This holds for all g isin G Applying Theorem 31gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

290 XUN LU ET AL

Proof of Theorem 34 We prove this by showing that Y perp D | X implies that Dg rArrSYg for all g isinG Given the assumed regularity Theorem 32 ensures that Y perp D | X impliesthat Yg perp Dg | Xg for all g isin G

Given (i) eq(33) and A2(ii) imply that for all j in group JgE

(Yj |X j = x Dj = d

) = r1(d z bj

)+ E[r2

(zUj bj

) | Dj = d X j = x]

= r1(d z bj

)+ E[r2

(zUj bj

) | X j = x]

Clearly r1(d z bj

)is constant in d if and only if E

(Yj |Dj = d X j = x

)is constant in

d Yj perp Dj | X j implies that E(Yj |Dj = d X j = x

)is constant in d which then implies

that r1(d z bj

)is constant in d Thus Dj rArrS Yj so Dg rArrS Yg

Given (i i) let ky = 1 without loss of generality Then by A2(ii) for all j in group JgFUj (u | x) = E

[1Uj le u

∣∣ X j = x]

= E[

1r(d zUj bj

) le r(d zu bj

)∣∣ X j = x]

= E[

1r(Dj Zj Uj bj

) le r(d zu bj

)∣∣ Dj = d X j = x]

= E[

1Yj le r

(d zu bj

)∣∣ Dj = d X j = x]

= Fj(r(d zu bj

) | d x)

By strict monotonicity of Fj (y | d x) in y

r(d zu bj

) = Fminus1j

(FUj (u | x) | d x

)

Now Yj perp Dj | X j implies that Fminus1j

(FUj (u | x) | d x

)is constant in d so r

(d zu bj

)is

also a constant function in d ie Dj rArrS Yj so Dg rArrS Yg As either (i) or (i i) holds for each g isin G we have Dg rArrS Yg for all g isin G n

Proof of Theorem 35 To show that A1 - A2 and the assumed regularity for Ygtogether with Dg rArrS(Xg) Yg for some g isin G imply D perp Y | X we let j in group Jgg isin G be such that Dj rArrS(X j ) Yj A2(i i) ensures that Dj perp Uj | X j so

P[Yj le y | Dj = d X j = x] =int

1r(d zu bj ) le y

d Fj (u | d x)

=int

1r(d zu bj ) le y

d Fj (u | x)

By assumption Dj rArrS(X j ) Yj so there exists ylowast isin supp(Yj ) such that there is no mea-

surable mapping f jylowast for whichint

1r(d zu bj ) le ylowastd Fj (u | d x) = f jylowast(x) ae-x Specifically this rules out the possibility thatint

1r(d zu bj ) le ylowast

d Fj (u | X j ) = f jylowast(X j ) equiv P[Yj le ylowast | X j ] as

Thus there exists ylowast isin supp(Yj ) such that

P[Yj le ylowast | Dj = d X j = x

] = P[Yj le ylowast | X j = x

]

for x in a set of positive probability so Yj perp Dj | X j Given the regularity assumed for YgTheorem 32 (for Yg) gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 291

Proof of Theorem 41 The proof is analogous to that of Theorem 31 n

Proof of Proposition 42 The proof is analogous to that of Proposition 33 n

Proof of Proposition 43 Under B3 Dgt rArrS Ygt and Dgtminus1 rArrS Ygtminus1 forallg isinGt cupGtminus1 imply that for j isin Jg there exist two measurable functions r1 and r2 such that

Yjt = r1(Uj0 bj0)+ r2(Y jtminus1 Zjt Ujt bjt

) j = 1 N t = 1

Hence

Yjt minusYjtminus1 = r2(Y jtminus1 Zjt Ujt bjt

)minus r2(Y jtminus2 Zjtminus1Ujtminus1 bjtminus1

)

At the sample level this means that

Yt minusYtminus1 = r2(Y tminus1 Zt Ut Bt

)minus r2(Y tminus2 Ztminus1Utminus1 Btminus1

)

Then B4 implies that

Dt perp (Xt Y tminus1Ut Utminus1 Bt Btminus1) | Xt Y tminus1

which further implies that Dt perp (Yt minusYtminus1) | Xt Y tminus1by Dawid (1979 Lemmas 41 and42) as (Yt minus Ytminus1) is a function of (Xt Y tminus1Ut Utminus1 Bt Btminus1) Then we have Dt perpYt | Xt Y tminus1 as Ytminus1 is a component of Y tminus1 n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

Page 12: 33 GRANGER CAUSALITY AND STRUCTURAL CAUSALITY IN … Lu_Su... · 2017. 2. 7. · 264 XUNLUETAL. natural experiments. The goal of this paper is to establish the analogous equiv-alence

274 XUN LU ET AL

Hahn 1998 Hirano Imbens and Ridder 2003 and Angrist and Pischke 2009)Define the covariate-conditioned effect of treatment on the treated as

AT TX = E (Y (1)minusY (0) | D = 1 X)

where Y (1) is the potential response to treatment and Y (0) is the potentialresponse in the absence of treatment Using A1 we write Y (1) = r(1 Z U b0)and Y (0) = r(0 Z U b0) AT TX thus has a clear structural interpretation Fromthis we can construct AT T

AT T = E (Y (1)minusY (0) | D = 1) = E(AT TX | D = 1)

Note that here we define AT T and AT TX based on our structural equation r Nevertheless they can be defined without a structural model ie based on thetwo potential outcomes Y (0) and Y (1) (see eg Rubin 1974 and Angrist andPischke 2009)

To identify AT TX it suffices that D perp Y (0) | X Given A1 it suffices forthis that D perp U | X as is readily verified This conditional exogeneity is acommon identifying assumption in cross-sections6 Classical strict exogeneity((D Z) perp U ) is sufficient but not necessary for this Using D perp U | X in thesecond line below we have

AT TX = E [r(1 Z U b0) | D = 1 X ]minus E [r(0 Z U b0) | D = 1 X ]

= E [r(1 Z U b0) | D = 1 X ]minus E [r(0 Z U b0) | D = 0 X ]

= E(Y | D = 1 X)minus E(Y | D = 0 X) = μ(1 X)minusμ(0 X)

Thus AT TX can be expressed in terms of the distribution of observables (hereμ(1 X) and μ(0 X)) so this effect is identified (eg Hurwicz 1950) Simi-larly AT T is identified The identification result based on conditional exogeneityhas been extensively discussed in the treatment effect literature (see eg Rubin1974 Holland 1986 and Angrist and Pischke 2009)

To see the structural meaning for Gminuscausality observe that E(Y | D = 1 X)minusE(Y | D = 0 X) = 0 as is another way to write E(Y | D X) = E(Y | X) asThus in this context Gminusnon-causality in mean of D for Y wrt X is equivalentto AT TX = 0 as

Note that even for binary treatments Gminuscausality in mean does not tell thefull story as Gminuscausality in distribution can hold even when AT TX or AT Tvanishes For example suppose

Y = U D

with D perp U E(U ) = 0 E(U 2) lt infin and E(D2) lt infin Then D structurallycauses Y its effect on Y is U a random effect These effects may be posi-tive or negative however they average out as AT T = 07 Here and as we willshow generally Gminusnon-causality essentially has the interpretation that Y is not

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 275

structurally caused by D under the key conditional exogeneity assumption Inthis example we have Gminuscausality (D perp Y ) as E(Y 2 | D) = D2 E(U 2) TestingGminusnon-causality will detect the structural causality here in contrast to testingAT T = 0 or equivalently Gminusnon-causality in mean of D for Y

When D is nonbinary A1 together with suitable exogeneity conditionssimilarly suffices to identify certain effects that give structural meaning toGminuscausality This makes it generally possible to test for structural causality bytesting for Gminuscausality

With heterogeneity matters become somewhat more complicated In particularin the binary case μ(1 X)minusμ(0 X) is no longer AT TX Instead with suitableexogeneity (Assumption A2 below) we have that for each x μ(1 x)minusμ(0 x)recovers a blended effect

μ(1 x)minusμ(0 x) =sumg=1

ζg|x AT Tg|x

where the ζg|x rsquos are nonnegative weights adding to one and for all j isin JgAT Tg|x = E[Yj (1) minus Yj (0) | Dj = 1 X j = x] is the group-specific covariate-conditioned average effect of treatment on the treated Here Yj (1) is the poten-tial response to treatment and Yj (0) is the potential response in the absence oftreatment

33 Exogeneity and effect identification with heterogeneity

With heterogeneity an identifying exogeneity condition relevant for analyzingGminuscausality is

Assumption A2 (Heterogeneous Cross-Section Exogeneity) For all g isin G(i) Dg | Xg sim D | X and (i i) Dg perp Ug | Xg

Observe that this imposes structures for all groups namely all g isin G Althoughthis condition is not the weakest possible (see Chalak and White (2012) and thediscussion below) A2 generally suffices to identify effects and link Gminuscausalityand structural causality A2(i) restricts the allowed distributional heterogeneityfor all g isin G the conditional distributions of Dg given Xg are assumed identicalA2(i i) imposes conditional exogeneity for all groups Below we discuss thisfurther

THEOREM 31 Suppose A1 - A2(i) hold If A2(i i) holds then D perp U | X

It is also of interest to know whether the converse holds A strict conversedoes not hold as certain fortuitous cancellations in population-group condi-tional distributions can yield sample conditional exogeneity without population-group conditional exogeneity Nevertheless the converse does hold under a mildregularity condition ruling out exceptional cases We introduce the followingdefinition

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

276 XUN LU ET AL

DEFINITION 34 Ug | Dg Xg g isin G is regular ifsumgisinG

pg fg(u | d x) fg(x) =sumgisinG

pg fg(u | x) fg(x) for all (ud x) (32)

implies Ug perp Dg | Xg for all g isin GWhen G = 1 Ug | Dg Xg g isin G is necessarily regular To understand whatregularity rules out consider the next simplest case with G = 2 and Xg absentThen Ug | Dg g isin G fails to be regular if and only if U2 perp D2 (say) and

f1(u | d) = f1(u)minus 1minus p1

p1[ f2(u | d)minus f2(u)] for all (ud)

where the two groups have been re-indexed for convenience Suppose that p1f1(u) f2(u) and f2(u | d) are arbitrary Then it is easily arranged that f1(u | d)can be negative for (ud) in a set of positive probability so f1(u | d) does notdefine a conditional density and Ug | Dg g isin G is regular after all If this f1(u |d) is nevertheless a conditional density it is clearly highly special as it ensuresthat eq(32) holds for the given p1 and the functions defined by f1(u) f2(u) andf2(u | d) (all of which are typically unknown) but not necessarily otherwise

In the general case Ug | Dg Xg g isin G fails to be regular if and only ifUg perp Dg | Xg for some g isin G and

f1(u | d x) = f1(u | x)minus (p1 f1(x))minus1sumg=2

pg [ fg(u | d x) minus fg(u | x)] fg(x)

times for all (ud x)

where we again re-index the groups As before this is clearly a very special pop-ulation configuration ruling out such cases by imposing regularity is a weak re-striction

The converse result is

THEOREM 32 Suppose A1 - A2(i) hold and suppose Ug | Dg Xgg isin Gis regular If D perp U | X then A2(i i) holds

Theorems 31-32 also hold with Yg | Dg Xg g isinG regular D perp U | X replacedby D perp Y | X and A2(i i) replaced by Dg perp Yg | Xg g isin G This result plays acentral role in linking structural causality and Gminuscausality

34 Linking structural causality and GminuscausalityAs we have seen A1 imposes structures where causal effects are well definedA2 permits recovering versions of these As we now prove this gives structuralmeaning to Gminuscausality generally Recall that the bj rsquos are identical in group Jgthus the same structural causality relations hold for all j in a given group Wewrite Dg rArrS Yg when Dj rArrS Yj for (all) j in group Jg Our next result showsthat structural noncausality implies Gminusnon-causality

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 277

PROPOSITION 33 Let A1 - A2 hold Then structural noncausality(Dg rArrS Yg forallg isinG) implies that D does not Gminuscause Y wrt X ie D perp Y | X

This result is intuitive when structural effects are well defined and identifiableGminusnon-causality follows from structural noncausality Holland (1986 p 958)gives a similar result for the random experiment case Our Proposition 33 jus-tifies using tests of Gminusnoncausality to test structural noncausality if we rejectGminusnoncausality we must reject structural noncausality

As WL show structural causality does not imply Gminuscausality Without furtherassumptions the concepts are not equivalent Nevertheless if as is commonlyassumed in the literature (see WL for discussion) r obeys separability betweenobservable causes D and unobservable causes U or r obeys a specific form ofmonotonicity in U then with suitable regularity structural causality does implyGminuscausality We have

THEOREM 34 Given A1 - A2 suppose Yg | Dg Xgg isin G is regularSuppose further that for all g isin G either (i) or (i i) holds

(i) For all j isin Jg for unknown measurable functions r1 and r2

Yj = r1(Dj Zj bj

)+ r2(Zj Uj bj

) (33)

(ii) For all j isinJg for = 1 ky r(d zu bj ) = r0(d zu bj ) for scalaru where r0(d z middot bj ) is strictly monotone increasing for each admissi-ble (d z) and Fj(y | d x) = P[Yj le y | Dj = d X j = x] is strictlymonotone in y for each admissible (d x)

Then structural causality (for some g isin G Dj rArrS Yj for (all) j in Jg) impliesthat D Gminuscauses Y wrt X ie D perp Y | X

Kasy (2011) gives a discussion of structures satisfying (i i) Even without sep-arability or monotonicity there is an equivalence between Gminuscausality and astronger notion of structural causality that handles certain exceptional cases wherethe causal structure and the conditional distribution of (Uj Dj ) given X j inter-act in just the right way to hide the structural causality WL call this strongernotion structural causality with positive probability and provide discussions Welet Yj = supp(Yj ) and Xj = supp(X j )

DEFINITION 35 (Structural Causality with Positive Probability) SupposeA1 holds and let j be given Suppose that for each y isin Yj there exists a mea-surable function f jy Xj rarr [01] such thatint

1r(Dj Zj u bj

)lt y

d Fj

(u | X j

) = f jy(X j

)as (34)

where Fj (u | X j ) = E[1Uj le u | X j ] Then Dj does not structurally cause Yj

almost surely (as) wrt X j (Dj rArrS(X j ) Yj ) Otherwise Dj structurally causesYj with positive probability (wpp) wrt X j

(Dj rArrS(X j ) Yj

)

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

278 XUN LU ET AL

By the definition of a group the same structural causality wpp relations holdfor all j in a given group We thus write Dg rArrS(Xg) Yg when Dj rArrS(X j ) Yj holdsfor all j in group Jg

Below is an example in WL in which Dj structurally causes Yj while Dj doesnot structurally cause Yj as Consider the structural equation r

Yj = r(Dj Uj

) = DjradicD2

j +1Uj1 + 1radic

D2j +1

Uj2

where Uj equiv (Uj1Uj2) and Dj are all N (01) random variables and Dj U1 j andU2 j are mutually independent For simplicity there is no bj Zj or X j It is clearthat Dj structurally causes Yj here Nevertheless Dj does not structurally causeYj as To see this note that Yj and Dj are independent Thus the LHS of eq(34)becomesint

1r(Dj u1u2

)lt y

d Fj (u1u2)

=int

1r(Dj u1u2

)lt y

d Fj

(u1u2|Dj

) = Pr[Yj lt y|Dj

] = (y)

where Fj (middot) and Fj(middot|Dj

)denote the CDF of Uj and the conditional CDF of Uj

given Dj and is the standard normal CDF Thus the RHS does not depend onDj ie Dj does not structurally cause Yj as

Theorem 35 below gives a result linking Gminuscausality and structural causalityas

THEOREM 35 Suppose A1 - A2 hold Suppose also that Yg | Dg Xgg isinG is regular If Dj structurally causes Yj wpp wrt X j (for some g isin GDj rArrS(X j ) Yj for all j in Jg) then D Gminuscauses Y wrt X ie D perp Y | X

Together Proposition 33 and Theorem 35 establish that cross-sectionGminuscausality and structural causality are essentially equivalent under the key con-ditional exogeneity assumption We say ldquoessentiallyrdquo as Theorem 35 rules outthe two ways that Gminusnon-causality can mask structural causality The first arisesfrom subtle interactions between the causal structure and the conditional distribu-tion of (Uj Dj ) given X j The second arises under heterogeneity from delicatecancellations among the conditional distributions of Yj given Dj and X j acrossgroups These exceptional possibilities only trivially mitigate the power of testsfor Gminuscausality as tests for structural causality Errors of interpretation and in-ference need not result as long as these exceptions are recognized

35 Identification and identification failure

To gain further insight into the relation between Gminuscausality and effect identi-fication we now undertake a deeper analysis of Gminuscausality in mean For this

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 279

let qg(d x) = fg(d x)sumhisinG (ph fh(d x)) Under A1 the law of iterated ex-

pectations gives

E (Y | D = d X = x) =sumgisinG

pg qg(d x)

intr(d zu bg) d Fg(u | d x)

For concreteness consider identifying average marginal effects for a continu-ous treatment d To allow both discrete and continuous Ug let νg(u | d x) definea σminusfinite measure dominating that defined by Fg(u | d x) and let fg(u | d x)define the associated Radon-Nikodym density for Ug given (Dg Xg) Assum-ing differentiability and mild regularity derivations parallel to those of White andChalak (2013) give

part

partdE(Y | D = d X = x)

=sumgisinG

pg qg(d x)

intpart

partdr(d zu bg) fg(u | d x) dνg(u | d x)

+sumgisinG

pg qg(d x)

int[r(d zu bg)

part

partdln fg(u | d x)]

times fg(u | d x) dνg(u | d x)

+sumgisinG

pgpart

partdqg(d x)

intr(d zu bg) fg(u | d x)dνg(u | d x)

After some manipulation this regression derivative can be written as

part

partdE (Y | D = d X = x) = β(d x)+ δ1(d x)+ δ2(d x) (35)

The first of the three terms on the right is a weighted average marginal effect

β(d x) =sumgisinG

pg qg(d x)βg(d x) where βg(d x)

= E[(partpartd)r(Dg ZgUg bg) | Dg = d Xg = x

]is the covariate-conditioned average marginal effect of Dj on Yj at (Dj = d X j =x) for j isin Jg g isin G Thus when the other two terms vanish the regressionderivative identifies the structurally meaningful weighted effect β(d x) andwhen β(d x) is nonzero for (d x) in a set of positive probability we haveGminuscausality in mean of D for Y wrt X When the other two terms do notvanish we generally have identification failure and there is no necessary linkbetween Gminuscausality and structural causality

The second term is a pure endogeneity bias

δ1(d x) =sumgisinG

pg qg(d x) E(εg Sg | Dg = d Xg = x

)

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

280 XUN LU ET AL

where εg = Ygminus E(Yg | Dg = d Xg = x

)is the regression residual for the given

group and Sg = (partpartd) ln fg(Ug | Dg Xg) is the exogeneity score of White andChalak (2013) Observe that E(εg Sg | Dg = d Xg = x) represents a specificform of omitted variable bias where the exogeneity score is the omitted variableA sufficient (but not necessary) condition for this bias to vanish is A2(i i) Dg perpUg | Xg for all g isin G

The third term is a pure heterogeneity bias

δ2(d x) =sumgisinG

pgpart

partdqg(d x) E

(Yg | Dg = d Xg = x

)

due to heterogeneity of fg(d | x) across members of G8 A sufficient (but notnecessary) condition for this to vanish is A2(i) Dg | Xg sim D | X for all g isin G

Thus A2 ensures that the regression derivative identifies the weighted effect

β(d x) = βlowast(d x) =sumgisinG

pg qlowastg(x)βlowast

g(d x)

where qlowastg(x) = fg(x)

sumh (ph fh(x)) and βlowast

g(d x) = int(partpartd)r(d zu bg)

d Fg(u | x) This identification provides the link between Gminuscausality in meanand structural causality

Although A2 is not necessary for effect identification cases where identifica-tion holds in the absence of A2 are quite special analogous to failures of reg-ularity Thus for practical purposes A2 can be viewed as playing the key rolein identifying effects of interest and thereby linking Gminuscausality and structuralcausality

Further as Proposition 61 and Corollary 62 of WL show in the absence ofstructural causality Gminuscausality is essentially equivalent to exogeneity failureHere this is reflected in the fact that when βg(d x) vanishes for all g A1 andmild regularity conditions give

part

partdE (Y | D = d X = x) = δ1(d x)+ δ2(d x)

Thus with structural noncausality as Gminuscausality in mean implies the fail-ure of A2(i) A2(i i) or both conversely failure of A2 essentially ensuresGminuscausality in mean

36 Testing for Gminuscausality and structural causality in cross-sectiondata

As just seen the hypothesis of Gminusnon-causality and thus of structural noncausal-ity under the conditional exogeneity assumption is a specific conditional inde-pendence In the literature there are many conditional independence tests thatapply to IID data (eg see Delgado and Gonzalez-Manteiga 2001 Fernandes and

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 281

Flores 2001 Su and White 2007 2008 2014 Song 2009 Linton and Gozalo2014 Huang Sun and White 2016) These methods can be computationallychallenging as they are nonparametric Angrist and Kuersteiner (2004 2011)develop a semi-parametric test for conditional independence Based on the factthat Y perp D | X implies ψ1(Y ) perp ψ2(D) | X for any measurable vector functions(ψ1ψ2) WL propose several convenient regression-based tests

4 STRUCTURAL CAUSALITY AND GminusCAUSALITY IN PANEL DATAThe literature contains considerable discussion about Gminuscausality in the paneldata setting Nevertheless the focus is mainly on linear or parametric modelsFor example Chamberlain (1984) discusses Gminuscausality ldquoconditional on un-observables [lsquofixed effectrsquo]rdquo in a linear model and a logit model Holtz-EakinNewey and Rosen (1988) consider a traditional linear vector autoregression(VAR) and use GMM to test Gminuscausality Nair-Reichert and Weinhold (2001)discuss Gminuscausality in a dynamic mixed fixed- and random-coefficients linearmodel allowing heterogeneity across individuals Dumitrescu and Hurlin (2012)test for Gminuscausality in a linear dynamic panel model with fixed coefficients thatvary across individuals There is also a growing literature on nonlinear and non-separable panel models (see eg Hoderlein and White (2012) and the referencestherein) We are not aware that Gminuscausality has been discussed in such modelsso far

We emphasize that in panel data Gminuscausality is also entirely based on pre-dictability and has no causal interpretation In this section we link Gminuscausalityand structural causality in general panel structures Since the static panel case issimilar to the cross-sectional case we focus on a dynamic data generating processTo simplify notation for given y isin N+ we let Y jtminus1 denote the finite historyY jtminus1 = (Yjtminusy Yjtminus1)

Assumption B1 (Panel Population DGP) Let (F P) be a complete prob-ability space let N isin N+ and let y isin N+ For the members of a populationj isin 1 N let the random vectors Yjt t = 12 be structurally determinedby a triangular system as

Yjt = r(Y jtminus1 Djt Zjt Ujt bjt

) (41)

where r is an unknown measurable ky times 1 function ky isin N+ Yjτ (τ = 1 minusy 0) Djt (kd times1kd isin N+) Zjt (kz times1 kz isin N) and Ujt (ku times1 ku isin N)are vectors of nondegenerate random variables on (F P) and bjt is a non-random real vector of dimension kb isin N Suppose also that Wjt t = 12 arerandom vectors on (F P) with dimension kw isin N The triangular structureis such that for t = 12 neither Yjsinfins=1minusy

nor Djsinfins=1 structurally deter-mine Wjt Yjsinfins=1minusy

does not structurally determine Djt Zjt or Ujt andDjsinfins=1 does not structurally determine Zjt or Ujt

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

282 XUN LU ET AL

The interpretation of structural eq(41) is the same as that for eq(31) ex-cept that here we have multiple time periods t = 12 The elements of(Yjt Djt Zjt Wjt Ujt

)and bjt rsquos can contain both time-varying and time-

invariant elements For now we do not distinguish these The elements of(Djt Zjt Wjt Ujt ) can contain lags of underlying variables By permittingonly finite histories Y jtminus1 Djt Zjt we restrict attention to Markov-type datagenerating processes this greatly simplifies the analysis and corresponds to thestructures usually considered in practice

As in the cross-section case we can have both structural heterogeneity anddistributional heterogeneity We specify the relevant structural and distributionalheterogeneities below

We again impose random sampling from the population For simplicity weassume all time periods can be observed for every individual so we observe abalanced panel Similar to the cross-section case we write

Yt = r (Y tminus1 Dt Zt Ut Bt ) t = 1 T

by suppressing the cross-sectional subscript i We assume that we observe data(Yt Dt Zt Wt ) t = 1 T and relevant lags prior to t = 1

41 Linking structural causality and Gminuscausality in dynamic panelsWe first define structural causality for the dynamic panel structure Let Djt be thesupport of Djt

DEFINITION 41 (Structural Causality in Dynamic Panels) Let j and t begiven If the function r(y jtminus1 middot zjt ujt bjt ) Djt rarr R

ky is constant on Djt

for all admissible y jtminus1 zjt and ujt then Djt does not structurally cause Yjt and we write Djt rArrS Yjt Otherwise Djt structurally causes Yjt and we writeDjt rArrS Yjt

We let X jt = (Z primejt W prime

jt )prime denote the covariates at time t WL consider

retrospective conditional exogeneity and retrospective Gminuscausality (White andKennedy 2009) We also allow this For this we letX jt denote a covariate his-tory that may contain X jt and lags or leads of X jt as in Wooldridge (2005 p 41)or WL ie for some m isin N+

X jt = SX(X jtminusm X jt X jt+m

) t = 12

where SX is a given selection matrix We assume

X jt X jtminusm subeX jt and

allow X jt also to contain leads of X jt When Xjt contains leads we have theretrospective case as in WL We denote randomly sampled covariates X t Forsimplicity we also assume that all needed elements of X t are observable Tem-poral precedence plays an important role in panel data In particular comparedwith the cross-sectional case we now have a richer set of covariates (includingboth leads and lags) to choose from

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 283

To provide a link between structural causality and Gminuscausality for dynamicpanels we condition on not only covariates X t but also lags of Yt For thiswe denote Y jtminus1 = (Yjtminusmy Yjtminus1) where my isin N+ and my ge y ie

Y jtminus1 sube Y jtminus1 Similarly we let Y tminus1 be the randomly sampled Y jtminus1Similar to the cross-section case for each t we define population group Jgt

g isin 1 t as a collection of population units having identical bjt and iden-tical distribution of (Ujt Djt ) |X jt Y jtminus1 We also let Gt = 1 t Forgiven t we write Dgt rArrS Ygt to denote Djt rArrS Yjt for all j in group Jgt g isin Gt

Next we impose an exogeneity assumption analogous to A2

Assumption B2 (Heterogeneous Dynamic Panel Exogeneity) Let t be givenFor all g isin Gt (i) Dgt |Xgt Y gtminus1 sim Dt |X t Y tminus1 and (i i) Dgt perp Ugt |Xgt Y gtminus1

Here as before for all g isin Gt the conditional distributions of Dgt given(Xgt Y gtminus1) are assumed to be identical and we write the common distributionas Dt |X t Y tminus1

The analog of Theorem 31 relating population and sample conditional exo-geneity is

THEOREM 41 Suppose B1ndashB2(i) hold and let t be given If B2(i i) holdsthen Dt perp Ut |X t Y tminus1

Proposition 42 provides a link between structural noncausality and Gminusnon-causality

PROPOSITION 42 Suppose that B1 and B2 hold Let t be given Thenstructural noncausality (Dgt rArrS Ygt forallg isin Gt ) implies that Dt does notGminuscause Yt wrtX t Y tminus1 ie Dt perp Yt |X t Y tminus1

We can also define structural causality with positive probability for dynamicstructures and show that structural causality and Gminuscausality are ldquoessentiallyrdquoequivalent under the key conditional exogeneity assumption For brevity we omitthe details

So far we have not carefully distinguished between time-varying and time-invariant elements of unobservable bjt and Ujt Below we briefly discuss thecase where we allow time-invariant components of Ujt and the randomly sampledbjt to be arbitrarily dependent or correlated with the cause of interest Djt For this we denote Ujt equiv (Ujt Uj0) where Ujt is time varying and Uj0time-invariant Similarly we denote bjt equiv (bjt bj0) The randomly sampledUjt Uj0 bjt bj0 are denoted as Ut U0 Bt B0 respectively In princi-ple we want to remove the time-invariant Uj0 and bj0 using eg the firstdifference For this it is convenient to impose a separable assumption on thestructural equation9 We also impose a conditional exogeneity assumption at thesample level

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

284 XUN LU ET AL

Assumption B3 Suppose that B1 holds with

Yjt = r1(Djt Uj0 bj0

)+r2(Y jtminus1 Djt Zjt Ujt bjt

) j = 1 N t = 1

where r1 and r2 are two unknown ky times1 measurable functions

Assumption B4 Suppose that for given t Dt perp (Ut Utminus1 Bt Btminus1

) |Xt Y tminus1

B4 is a sample-level assumption which can be supported by a correspond-ing assumption at the heterogeneous population level Note that in general B4is a strong assumption as we condition on Y tminus1 which is a function of bothDtminus1 and Utminus1 Nevertheless this assumption is plausible under the null of struc-tural noncausality as Y jtminus1 is a constant function of Djtminus1 under the nullOne simple example is that bjt rsquos are constants over j and t and Djt infint=1 perpYjt 0

t=1minusyUjt infint=1X jt infint=1 Under the null of structural noncausality for

all t Y jtminus1 is a function of Yjs0s=1minusy

Ujsts=1X jst

s=1 Then it is easyto show that in this case B4 is satisfied Certainly it will be interesting to relaxB4 but we leave this for future research

PROPOSITION 43 Suppose that B1ndashB4 hold Suppose that Dgt rArrS Ygtand Dgtminus1 rArrS Ygtminus1 forallg isin Gt cupGtminus1 Then Dt perp Yt | Xt Y tminus1

Proposition 43 suggests that when there are time-invariant components in theunobservables we can test for structural noncausality by testing for Gminusnon-causality Dt perp Yt | Xt Y tminus1 under the conditional exogeneity assumption B4

42 Testing for Gminuscausality and structural causality in panel dataAs shown above we can perform a Gminusnon causality test to test for structuralnoncausality Testing Gminusnon-causality in panel data is also simply a conditionalindependence test Here we focus on testing Dt perp Yt | Xt Y tminus1

First assume that the joint distributions of (Dt Yt Xt ) are identical over t In this case we can pool the data and implement a conditional independence testas discussed in Section 36

Second suppose that the joint distributions of (Dt Yt Xt ) are different overtime t In this case for each time period t using the cross-section data we canimplement a conditional independence test for Dt perp Yt | (Xt Y tminus1) and obtain atest statistic sayWt Thus we have T test statistics W1W2 WT and eachone can be used to test for structural noncausality for each t We may also want totest the hypothesis that for all t Djt does not structurally cause Yjt For this wecan construct our test statistic by taking the average of the T test statistics ieW equiv 1

T

sumTt=1Wt as in Im Pesaran and Shin (2003) and Dumitrescu and Hurlin

(2012) Other forms of ldquoaveragesrdquo are possible and special care is needed to takeinto account the dependence structure over time

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 285

5 CONCLUSION

This paper provides direct links between Granger causality and structural causal-ity in cross-section and panel data We extend Granger causality to cross-section and panel data and give a natural definition of structural causality inheterogeneous populations We show that under the key conditional exogeneityassumption Granger causality is essentially equivalent to structural causality incross-section and panel data Similar to the results in White and Lu (2010) fortime-series data our results here should enable researchers to avoid the misuse ofGranger causality and to establish the desired structural causal relation

NOTES

1 Strictly speaking if the conditional distributions are parameterized or treated as infinite-dimensional parameters then distributional heterogeneity may also be thought of as a special caseof structural heterogeneity

2 The classification is certainly overly simplistic We ignore the early literature on the philosophyof causality (for a review see eg Holland 1986 and Hoover 2008) We also ignore the litera-ture on causality discussed in other disciplines notably in machine learning (see eg Pearl 20092015 White and Chalak 2009 and White Chalak and Lu 2011) In particular Heckman and Pinto(2015) compare Haavelmorsquos structural framework of causality with the framework of Directed AcyclicGraphs (DAG) studied in the Bayesian network

3 Although Uj is typically called ldquounobservablerdquo it is better to view its elements as variables thatwill be omitted from empirical analysis This may be because they are unobservable it may also bebecause the researcher has purposefully or inadvertantly neglected them

4 For notational simplicity we suppress the observation subscript i5 Using our notation Holland considers the case where D perp (XU ) which implies our condi-

tional exogeneity assumption D perp U | X below6 See eg Altonji and Matzkin (2005) Hoderlein and Mammen (2007 2009) Imbens and Newey

(2009) White and Lu (2011) and White and Chalak (2013)7 AT T = E(Y (1)minusY (0) | D = 1) = E(U minus0 | D = 1) = E(U ) = 0

8 The conditional density fg(d | x) can also be interpreted as Imbensrsquos (2000) generalized propen-sity score

9 There is a recent literature which allows time-invariant unobservables to be arbitrarily correlatedwith causes of interest in the general nonseparable models (see eg Evdokimov 2010) We leave theproblem of linking G-causality and structural causality in this general setting for future research

REFERENCES

Altonji J amp R Matzkin (2005) Cross section and panel data estimators for nonseparable models withendogenous regressors Econometrica 73 1053ndash1102

Angrist J K Graddy amp G Imbens (2000) The interpretation of instrumental variables estimatorsin simultaneous equations models with an application to the demand for fish Review of EconomicStudies 67 499ndash527

Angrist J O Jorda amp G Kuersteiner (2013) Semiparametric estimates of monetary policy effectsstring theory revisited NBER Working Paper 19355

Angrist J amp G Kuersteiner (2004) Semiparametric causality tests using the policy propensity scoreNBER Working Paper No 10975

Angrist J amp G Kuersteiner (2011) Causal effects of monetary shocks semiparametric conditionalindependence tests with a multinomial propensity score Review of Economics and Statistics 93725ndash747

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

286 XUN LU ET AL

Angrist J amp J Pischke (2009) Mostly Harmless Econometrics An Empiricistrsquos Companion PrincetonUniversity Press

Bester CA amp CB Hansen (2016) Grouped effects estimators in fixed effects models Journal ofEconometrics 190 197ndash208

Bonhomme S amp E Manresa (2015) Grouped patterns of heterogeneity in panel data Econometrica83 1147ndash1184

Browning M amp JM Carro (2007) Heterogeneity and microeconometrics modelling In R BlundellW K Newey and T Persson (Eds) Advances in Economics and Econometrics Theory andApplications Ninth World Congress of the Econometric Society Volume 3 pp 45ndash74 CambridgeUniversity Press

Browning M amp JM Carro (2010) Heterogeneity in dynamic discrete choice models EconometricsJournal 13 1ndash39

Burda M M Harding amp J Hausman (2015) A Bayesian semi-parametric competing risk model withunobserved heterogeneity Journal of Applied Econometrics 30 353ndash376

Chalak K amp H White (2011) An extended class of instrumental variables for the estimation of causaleffects Canadian Journal of Economics 44 1ndash51

Chalak K amp H White (2012) Causality conditional independence and graphical separation insettable systems Neural Computation 24 1611ndash1668

Chamberlain G (1982) The general equivalence of Granger and Sims causality Econometrica 50569ndash581

Chamberlain G (1984) Panel data In Z Griliches amp M Intriligator (eds) Handbook of Economet-rics vol2 ch 22 pp1247ndash1318 Elsevier

Chesher A (2003) Identification in nonseparable models Econometrica 71 1405ndash1441Chesher A (2005) Nonparametric identification under discrete variation Econometrica 73

1525ndash1550Dawid AP (1979) Conditional independence in statistical theory Journal of the Royal Statistical

Society Series B 41 1ndash31Deb P amp PK Trivedi (2013) Finite mixture for panels with fixed effects Journal of Econometric

Methods 2 35ndash51Delgado M amp W Gonzalez-Manteiga (2001) Significance testing in nonparametric regression based

on the bootstrap Annals of Statistics 29 1469ndash1507Dumitrescu E-I amp C Hurlin (2012) Testing for Granger non-causality in heterogeneous panels

Economic Modelling 29 1450ndash1460Evdokimov K (2010) Identification and estimation of a nonparametric panel data model with unob-

served heterogeneity Working paper Department of Economics Princeton UniversityFlorens JP (2003) Some technical issues in defining causality Journal of Econometrics 112

127ndash128Florens JP amp D Fougere (1996) Non-causality in continuous time Econometrica 64 1195ndash1212Florens JP amp M Mouchart (1982) A note on non-causality Econometrica 50 583ndash591Granger CWJ (1969) Investigating causal relations by econometric models and cross-spectral

methods Econometrica 37 424ndash438Granger CWJ (1980) Testing for causality a personal viewpoint Journal of Economic Dynamics

and Control 2 329ndash352Granger CWJ (1988) Some recent developments in a concept of causality Journal of Econometrics

39 199ndash211Granger CWJ amp P Newbold (1986) Forecasting Economic Time Series Academic Press 2nd

editionHausman J amp T Woutersen (2014) Estimating a semi-parametric duration model without specifying

heterogeneity Journal of Econometrics 178 114ndash131Haavelmo T (1943) The statistical implications of a system of simultaneous equations Econometrica

11 1ndash12Haavelmo T (1944) The probability approach in econometrics Econometrica 12 (supp) iii-vi

1ndash115

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 287

Hahn J (1998) On the role of the propensity score in efficient semiparametric estimation of averagetreatment effect Econometrica 66 315ndash331

Hahn J amp R Moon (2010) Panel data models with finite number of multiple equilibria EconometricTheory 26 863ndash881

Heckman JJ (2000) Causal parameters and policy analysis in economics a twentieth century retro-spective Quarterly Journal of Economics 115 45ndash97

Heckman JJ (2008) Econometric causality International Statistical Review 76 1ndash27Heckman JJ amp R Pinto (2015) Causal analysis after Haavelmo Econometric Theory 31

115ndash151Hirano K G Imbens amp G Ridder (2003) Efficient estimation of average treatment effects using the

estimated propensity score Econometrica 71 1161ndash1189Hoderlein S amp E Mammen (2007) Identification of marginal effects in nonseparable models without

monotonicity Econometrica 75 1513ndash1519Hoderlein S amp E Mammen (2009) Identification and estimation of local average derivatives in non-

separable models without monotonicity Econometrics Journal 12 1ndash25Hoderlein S amp H White (2012) Nonparametric identification in nonseparable panel data models with

generalized fixed effects Journal of Econometrics 168 300ndash314Holland P (1986) Statistics and causal inference Journal of the American Statistical Association 81

945ndash970Holtz-Eakin D W Newey amp HS Rosen (1988) Estimating vector autoregressions with panel data

Econometrica 56 1371ndash1396Hoover K (2008) Causality in economics and econometrics In S Durlauf and L Blume (eds)

The New Palgrave Dictionary of Economics 2nd edition Palgrave MacmillanHsiao C (2014) Analysis of Panel Data Cambridge University PressHuang M Y Sun amp H White (2016) A flexible nonparametric test for conditional independence

Econometric Theory forthcomingHurwicz L (1950) Generalization of the concept of identification In TC Koopmans (eds) Statistical

Inference in Dynamic Economic Models pp 238ndash257 John WileyIm K H Pesaran amp Y Shin (2003) Testing for unit roots in heterogeneous panels Journal of Econo-

metrics 115 53ndash74Imbens G (2000) The role of the propensity score in estimating dose-response functions Biometrika

87 706ndash710Imbens G (2004) Nonparametric estimation of average treatment effects under exogeneity a review

Review of Economics and Statistics 86 4ndash29Imbens G amp W Newey (2009) Identification and estimation of triangular simultaneous equations

models without additivity Econometrica 77 1481ndash1512Imbens G amp J Wooldridge (2009) Recent developments in the econometrics of program evaluation

Journal of Economic Literature 47 5ndash86Kasy M (2011) Identification in triangular systems using control functions Econometric Theory 27

663ndash671Koopmans T (1950) Statistical Inference in Dynamic Economic Models Cowles Commission Mono-

graph No 10 WileyKuersteiner G (2008) GrangerndashSims causality In S Durlauf and L Blume (Eds) The New Palgrave

Dictionary of Economics 2nd edition Palgrave MacmillanLechner M (2001) Identification and estimation of causal effect of multiple treatments under con-

ditional independence assumption In M Lechner F Pfeiller (eds) Econometric Evaluations ofLabour Market Policies Hildelberg Physical 43ndash58

Lechner M (2011) The relation of different concepts of causality used in time series and microecono-metrics Econometric Reviews 30 109ndash127

Lin C-C amp S Ng (2012) Estimation of panel data models with parameter heterogeneity when groupmembership is unknown Journal of Econometric Methods 1 42ndash55

Lindsay B (1995) Mixture Models Theory Geometry and Applications Hayward Institute for Math-ematical Statistics

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

288 XUN LU ET AL

Linton O amp P Gozalo (2014) Testing conditional independence restrictions Econometric Reviews33 523ndash552

Lu X amp L Su (2014) Determining the number of groups in latent panel structures Working paperSchool of Economics Singapore Management University

Matzkin R (2003) Nonparametric estimation of nonadditive random functions Econometrica 711339ndash1375

Matzkin R (2007) Nonparametric identification In JJ Heckman and EE Leamer (eds) The Hand-book of Econometrics 6B North-Holland

Mclachlan G amp K Basford (1988) Mixture Models Inference and Applications to Clustering MarcelDekker

Nair-Reichert U amp D Weinhold (2001) Causality tests for cross-country panels a new look at FDIand economic growth in developing countries Oxford Bulletin of Economics and Statistics 63153-171

Pearl J (2009) Causality Models Reasoning and Inference (2nd edition) Cambridge UniversityPress

Pearl J (2015) Trygve Haavelmo and the emergence of causal calculus Econometric Theory 31152ndash179

Rosenbaum P amp D Rubin (1983) The central role of the propensity score in observational studies forcausal effects Biometrika 70 41ndash55

Rubin D (1974) Estimating causal effects of treatments in randomized and non-randomized studiesJournal of Educational Psychology 66 688ndash701

Rubin D (2004) Direct and indirect causal effects via potential outcomes Scandinavian Journal ofStatistics 31 161ndash170

Sarafidis V amp N Weber (2015) A partially heterogeneous framework for analyzing panel data OxfordBulletin of Economics and Statistics 77 274ndash296

Sims C (1972) Money income and causality American Economic Review 62 540ndash552Song K (2009) Testing conditional independence via Rosenblatt transforms Annals of Statistics 37

4011ndash4015Strotz R amp H Wold (1960) Recursive vs nonrecursive systems An attempt at synthesis Economet-

rica 28 417ndash427Su L Z Shi amp PCB Phillips (2014) Identifying latent structures in panel data Working paper

Department of Economics Yale UniversitySu L amp H White (2007) A Consistent characteristic function-based test for conditional indepen-

dence Journal of Econometrics 141 807ndash834Su L amp H White (2008) A nonparametric Hellinger metric test for conditional independence Econo-

metric Theory 24 829ndash864Su L amp H White (2014) Testing conditional independence via empirical likelihood Journal of

Econometrics 182 27ndash44Sun Y (2005) Estimation and inference in panel structure models Working paper Department of

Economics UCSDWhite H amp K Chalak (2009) Settable systems An extension of Pearlrsquos causal model with optimiza-

tion equilibrium and learning Journal of Machine Learning Research 10 1759ndash1799White H amp K Chalak (2010) Testing a conditional form of exogeneity Economics Letters 109

88ndash90White H amp K Chalak (2013) Identification and identification failure for treatment effects using

structural systems Econometric Reviews 32 273ndash317White H K Chalak amp X Lu (2011) Linking Granger causality and the Pearl causal model with

settable systems Journal of Machine Learning Research Workshop and Conference Proceedings12 1ndash29

White H amp P Kennedy (2009) Retrospective estimation of causal effects through time In J Castleamp N Shephard (eds) The Methodology and Practice of Econometrics A Festschrift in Honour ofDavid F Hendry pp 59ndash87 Oxford University Press

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 289

White H amp X Lu (2010) Granger causality and dynamic structural systems Journal of FinancialEconometrics 8 193ndash243

White H amp X Lu (2011) Causal diagrams for treatment effect estimation with application to selectionof efficient covariates Review of Economics and Statistics 93 1453ndash1459

Wooldridge J (2005) Simple solutions to the initial conditions problem for dynamic nonlinear paneldata models with unobserved heterogeneity Journal of Applied Econometrics 20 39ndash54

Zellner A (1979) Causality and econometrics In K Brunner and AH Meltzer (eds) Three Aspectsof Policy and Policymaking Carnegie-Rochester Conference Series Vol 10 North-Holland 9ndash54

APPENDIX

Proof of Theorem 31 To show D perp U | X we establish that f (u | d x) = f (u | x)for all (ud x) We have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

= f (u x)

f (x)= f (u | x)

where the first equality in the second line holds since A2(i) (ie Dg | Xg sim D | X)implies fg(d x) = f (d | x) fg(x) say and the second holds by A2(i i) fg(u | d x) =fg(u | x) n

Proof of Theorem 32 Note that

f (u | x) = f (u x)

f (x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

We also have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | d x) fg(x)sum

gisinG pg fg(x)

The first equality in the second line follows from A2(i) Then f (u | d x) = f (u | x) forall (ud x) if and only ifsumgisinG

pg fg(u | d x) fg(x) =sumgisinG

pg fg(u | x) fg(x) for all (ud x)

By assumption (Ug | Dg Xg) g isinG is regular It follows immediately that for all g isinGDg perp Ug | Xg n

Proof of Proposition 33 Take any g isin G and let j belong to group Jg Dg rArrSYg means that Yj = r(Zj Uj bj ) Thus Uj perp Dj | X j implies that Yj perp Dj | X j byDawid (1979 Lemmas 41 and 42) This holds for all g isin G Applying Theorem 31gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

290 XUN LU ET AL

Proof of Theorem 34 We prove this by showing that Y perp D | X implies that Dg rArrSYg for all g isinG Given the assumed regularity Theorem 32 ensures that Y perp D | X impliesthat Yg perp Dg | Xg for all g isin G

Given (i) eq(33) and A2(ii) imply that for all j in group JgE

(Yj |X j = x Dj = d

) = r1(d z bj

)+ E[r2

(zUj bj

) | Dj = d X j = x]

= r1(d z bj

)+ E[r2

(zUj bj

) | X j = x]

Clearly r1(d z bj

)is constant in d if and only if E

(Yj |Dj = d X j = x

)is constant in

d Yj perp Dj | X j implies that E(Yj |Dj = d X j = x

)is constant in d which then implies

that r1(d z bj

)is constant in d Thus Dj rArrS Yj so Dg rArrS Yg

Given (i i) let ky = 1 without loss of generality Then by A2(ii) for all j in group JgFUj (u | x) = E

[1Uj le u

∣∣ X j = x]

= E[

1r(d zUj bj

) le r(d zu bj

)∣∣ X j = x]

= E[

1r(Dj Zj Uj bj

) le r(d zu bj

)∣∣ Dj = d X j = x]

= E[

1Yj le r

(d zu bj

)∣∣ Dj = d X j = x]

= Fj(r(d zu bj

) | d x)

By strict monotonicity of Fj (y | d x) in y

r(d zu bj

) = Fminus1j

(FUj (u | x) | d x

)

Now Yj perp Dj | X j implies that Fminus1j

(FUj (u | x) | d x

)is constant in d so r

(d zu bj

)is

also a constant function in d ie Dj rArrS Yj so Dg rArrS Yg As either (i) or (i i) holds for each g isin G we have Dg rArrS Yg for all g isin G n

Proof of Theorem 35 To show that A1 - A2 and the assumed regularity for Ygtogether with Dg rArrS(Xg) Yg for some g isin G imply D perp Y | X we let j in group Jgg isin G be such that Dj rArrS(X j ) Yj A2(i i) ensures that Dj perp Uj | X j so

P[Yj le y | Dj = d X j = x] =int

1r(d zu bj ) le y

d Fj (u | d x)

=int

1r(d zu bj ) le y

d Fj (u | x)

By assumption Dj rArrS(X j ) Yj so there exists ylowast isin supp(Yj ) such that there is no mea-

surable mapping f jylowast for whichint

1r(d zu bj ) le ylowastd Fj (u | d x) = f jylowast(x) ae-x Specifically this rules out the possibility thatint

1r(d zu bj ) le ylowast

d Fj (u | X j ) = f jylowast(X j ) equiv P[Yj le ylowast | X j ] as

Thus there exists ylowast isin supp(Yj ) such that

P[Yj le ylowast | Dj = d X j = x

] = P[Yj le ylowast | X j = x

]

for x in a set of positive probability so Yj perp Dj | X j Given the regularity assumed for YgTheorem 32 (for Yg) gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 291

Proof of Theorem 41 The proof is analogous to that of Theorem 31 n

Proof of Proposition 42 The proof is analogous to that of Proposition 33 n

Proof of Proposition 43 Under B3 Dgt rArrS Ygt and Dgtminus1 rArrS Ygtminus1 forallg isinGt cupGtminus1 imply that for j isin Jg there exist two measurable functions r1 and r2 such that

Yjt = r1(Uj0 bj0)+ r2(Y jtminus1 Zjt Ujt bjt

) j = 1 N t = 1

Hence

Yjt minusYjtminus1 = r2(Y jtminus1 Zjt Ujt bjt

)minus r2(Y jtminus2 Zjtminus1Ujtminus1 bjtminus1

)

At the sample level this means that

Yt minusYtminus1 = r2(Y tminus1 Zt Ut Bt

)minus r2(Y tminus2 Ztminus1Utminus1 Btminus1

)

Then B4 implies that

Dt perp (Xt Y tminus1Ut Utminus1 Bt Btminus1) | Xt Y tminus1

which further implies that Dt perp (Yt minusYtminus1) | Xt Y tminus1by Dawid (1979 Lemmas 41 and42) as (Yt minus Ytminus1) is a function of (Xt Y tminus1Ut Utminus1 Bt Btminus1) Then we have Dt perpYt | Xt Y tminus1 as Ytminus1 is a component of Y tminus1 n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

Page 13: 33 GRANGER CAUSALITY AND STRUCTURAL CAUSALITY IN … Lu_Su... · 2017. 2. 7. · 264 XUNLUETAL. natural experiments. The goal of this paper is to establish the analogous equiv-alence

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 275

structurally caused by D under the key conditional exogeneity assumption Inthis example we have Gminuscausality (D perp Y ) as E(Y 2 | D) = D2 E(U 2) TestingGminusnon-causality will detect the structural causality here in contrast to testingAT T = 0 or equivalently Gminusnon-causality in mean of D for Y

When D is nonbinary A1 together with suitable exogeneity conditionssimilarly suffices to identify certain effects that give structural meaning toGminuscausality This makes it generally possible to test for structural causality bytesting for Gminuscausality

With heterogeneity matters become somewhat more complicated In particularin the binary case μ(1 X)minusμ(0 X) is no longer AT TX Instead with suitableexogeneity (Assumption A2 below) we have that for each x μ(1 x)minusμ(0 x)recovers a blended effect

μ(1 x)minusμ(0 x) =sumg=1

ζg|x AT Tg|x

where the ζg|x rsquos are nonnegative weights adding to one and for all j isin JgAT Tg|x = E[Yj (1) minus Yj (0) | Dj = 1 X j = x] is the group-specific covariate-conditioned average effect of treatment on the treated Here Yj (1) is the poten-tial response to treatment and Yj (0) is the potential response in the absence oftreatment

33 Exogeneity and effect identification with heterogeneity

With heterogeneity an identifying exogeneity condition relevant for analyzingGminuscausality is

Assumption A2 (Heterogeneous Cross-Section Exogeneity) For all g isin G(i) Dg | Xg sim D | X and (i i) Dg perp Ug | Xg

Observe that this imposes structures for all groups namely all g isin G Althoughthis condition is not the weakest possible (see Chalak and White (2012) and thediscussion below) A2 generally suffices to identify effects and link Gminuscausalityand structural causality A2(i) restricts the allowed distributional heterogeneityfor all g isin G the conditional distributions of Dg given Xg are assumed identicalA2(i i) imposes conditional exogeneity for all groups Below we discuss thisfurther

THEOREM 31 Suppose A1 - A2(i) hold If A2(i i) holds then D perp U | X

It is also of interest to know whether the converse holds A strict conversedoes not hold as certain fortuitous cancellations in population-group condi-tional distributions can yield sample conditional exogeneity without population-group conditional exogeneity Nevertheless the converse does hold under a mildregularity condition ruling out exceptional cases We introduce the followingdefinition

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

276 XUN LU ET AL

DEFINITION 34 Ug | Dg Xg g isin G is regular ifsumgisinG

pg fg(u | d x) fg(x) =sumgisinG

pg fg(u | x) fg(x) for all (ud x) (32)

implies Ug perp Dg | Xg for all g isin GWhen G = 1 Ug | Dg Xg g isin G is necessarily regular To understand whatregularity rules out consider the next simplest case with G = 2 and Xg absentThen Ug | Dg g isin G fails to be regular if and only if U2 perp D2 (say) and

f1(u | d) = f1(u)minus 1minus p1

p1[ f2(u | d)minus f2(u)] for all (ud)

where the two groups have been re-indexed for convenience Suppose that p1f1(u) f2(u) and f2(u | d) are arbitrary Then it is easily arranged that f1(u | d)can be negative for (ud) in a set of positive probability so f1(u | d) does notdefine a conditional density and Ug | Dg g isin G is regular after all If this f1(u |d) is nevertheless a conditional density it is clearly highly special as it ensuresthat eq(32) holds for the given p1 and the functions defined by f1(u) f2(u) andf2(u | d) (all of which are typically unknown) but not necessarily otherwise

In the general case Ug | Dg Xg g isin G fails to be regular if and only ifUg perp Dg | Xg for some g isin G and

f1(u | d x) = f1(u | x)minus (p1 f1(x))minus1sumg=2

pg [ fg(u | d x) minus fg(u | x)] fg(x)

times for all (ud x)

where we again re-index the groups As before this is clearly a very special pop-ulation configuration ruling out such cases by imposing regularity is a weak re-striction

The converse result is

THEOREM 32 Suppose A1 - A2(i) hold and suppose Ug | Dg Xgg isin Gis regular If D perp U | X then A2(i i) holds

Theorems 31-32 also hold with Yg | Dg Xg g isinG regular D perp U | X replacedby D perp Y | X and A2(i i) replaced by Dg perp Yg | Xg g isin G This result plays acentral role in linking structural causality and Gminuscausality

34 Linking structural causality and GminuscausalityAs we have seen A1 imposes structures where causal effects are well definedA2 permits recovering versions of these As we now prove this gives structuralmeaning to Gminuscausality generally Recall that the bj rsquos are identical in group Jgthus the same structural causality relations hold for all j in a given group Wewrite Dg rArrS Yg when Dj rArrS Yj for (all) j in group Jg Our next result showsthat structural noncausality implies Gminusnon-causality

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 277

PROPOSITION 33 Let A1 - A2 hold Then structural noncausality(Dg rArrS Yg forallg isinG) implies that D does not Gminuscause Y wrt X ie D perp Y | X

This result is intuitive when structural effects are well defined and identifiableGminusnon-causality follows from structural noncausality Holland (1986 p 958)gives a similar result for the random experiment case Our Proposition 33 jus-tifies using tests of Gminusnoncausality to test structural noncausality if we rejectGminusnoncausality we must reject structural noncausality

As WL show structural causality does not imply Gminuscausality Without furtherassumptions the concepts are not equivalent Nevertheless if as is commonlyassumed in the literature (see WL for discussion) r obeys separability betweenobservable causes D and unobservable causes U or r obeys a specific form ofmonotonicity in U then with suitable regularity structural causality does implyGminuscausality We have

THEOREM 34 Given A1 - A2 suppose Yg | Dg Xgg isin G is regularSuppose further that for all g isin G either (i) or (i i) holds

(i) For all j isin Jg for unknown measurable functions r1 and r2

Yj = r1(Dj Zj bj

)+ r2(Zj Uj bj

) (33)

(ii) For all j isinJg for = 1 ky r(d zu bj ) = r0(d zu bj ) for scalaru where r0(d z middot bj ) is strictly monotone increasing for each admissi-ble (d z) and Fj(y | d x) = P[Yj le y | Dj = d X j = x] is strictlymonotone in y for each admissible (d x)

Then structural causality (for some g isin G Dj rArrS Yj for (all) j in Jg) impliesthat D Gminuscauses Y wrt X ie D perp Y | X

Kasy (2011) gives a discussion of structures satisfying (i i) Even without sep-arability or monotonicity there is an equivalence between Gminuscausality and astronger notion of structural causality that handles certain exceptional cases wherethe causal structure and the conditional distribution of (Uj Dj ) given X j inter-act in just the right way to hide the structural causality WL call this strongernotion structural causality with positive probability and provide discussions Welet Yj = supp(Yj ) and Xj = supp(X j )

DEFINITION 35 (Structural Causality with Positive Probability) SupposeA1 holds and let j be given Suppose that for each y isin Yj there exists a mea-surable function f jy Xj rarr [01] such thatint

1r(Dj Zj u bj

)lt y

d Fj

(u | X j

) = f jy(X j

)as (34)

where Fj (u | X j ) = E[1Uj le u | X j ] Then Dj does not structurally cause Yj

almost surely (as) wrt X j (Dj rArrS(X j ) Yj ) Otherwise Dj structurally causesYj with positive probability (wpp) wrt X j

(Dj rArrS(X j ) Yj

)

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

278 XUN LU ET AL

By the definition of a group the same structural causality wpp relations holdfor all j in a given group We thus write Dg rArrS(Xg) Yg when Dj rArrS(X j ) Yj holdsfor all j in group Jg

Below is an example in WL in which Dj structurally causes Yj while Dj doesnot structurally cause Yj as Consider the structural equation r

Yj = r(Dj Uj

) = DjradicD2

j +1Uj1 + 1radic

D2j +1

Uj2

where Uj equiv (Uj1Uj2) and Dj are all N (01) random variables and Dj U1 j andU2 j are mutually independent For simplicity there is no bj Zj or X j It is clearthat Dj structurally causes Yj here Nevertheless Dj does not structurally causeYj as To see this note that Yj and Dj are independent Thus the LHS of eq(34)becomesint

1r(Dj u1u2

)lt y

d Fj (u1u2)

=int

1r(Dj u1u2

)lt y

d Fj

(u1u2|Dj

) = Pr[Yj lt y|Dj

] = (y)

where Fj (middot) and Fj(middot|Dj

)denote the CDF of Uj and the conditional CDF of Uj

given Dj and is the standard normal CDF Thus the RHS does not depend onDj ie Dj does not structurally cause Yj as

Theorem 35 below gives a result linking Gminuscausality and structural causalityas

THEOREM 35 Suppose A1 - A2 hold Suppose also that Yg | Dg Xgg isinG is regular If Dj structurally causes Yj wpp wrt X j (for some g isin GDj rArrS(X j ) Yj for all j in Jg) then D Gminuscauses Y wrt X ie D perp Y | X

Together Proposition 33 and Theorem 35 establish that cross-sectionGminuscausality and structural causality are essentially equivalent under the key con-ditional exogeneity assumption We say ldquoessentiallyrdquo as Theorem 35 rules outthe two ways that Gminusnon-causality can mask structural causality The first arisesfrom subtle interactions between the causal structure and the conditional distribu-tion of (Uj Dj ) given X j The second arises under heterogeneity from delicatecancellations among the conditional distributions of Yj given Dj and X j acrossgroups These exceptional possibilities only trivially mitigate the power of testsfor Gminuscausality as tests for structural causality Errors of interpretation and in-ference need not result as long as these exceptions are recognized

35 Identification and identification failure

To gain further insight into the relation between Gminuscausality and effect identi-fication we now undertake a deeper analysis of Gminuscausality in mean For this

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 279

let qg(d x) = fg(d x)sumhisinG (ph fh(d x)) Under A1 the law of iterated ex-

pectations gives

E (Y | D = d X = x) =sumgisinG

pg qg(d x)

intr(d zu bg) d Fg(u | d x)

For concreteness consider identifying average marginal effects for a continu-ous treatment d To allow both discrete and continuous Ug let νg(u | d x) definea σminusfinite measure dominating that defined by Fg(u | d x) and let fg(u | d x)define the associated Radon-Nikodym density for Ug given (Dg Xg) Assum-ing differentiability and mild regularity derivations parallel to those of White andChalak (2013) give

part

partdE(Y | D = d X = x)

=sumgisinG

pg qg(d x)

intpart

partdr(d zu bg) fg(u | d x) dνg(u | d x)

+sumgisinG

pg qg(d x)

int[r(d zu bg)

part

partdln fg(u | d x)]

times fg(u | d x) dνg(u | d x)

+sumgisinG

pgpart

partdqg(d x)

intr(d zu bg) fg(u | d x)dνg(u | d x)

After some manipulation this regression derivative can be written as

part

partdE (Y | D = d X = x) = β(d x)+ δ1(d x)+ δ2(d x) (35)

The first of the three terms on the right is a weighted average marginal effect

β(d x) =sumgisinG

pg qg(d x)βg(d x) where βg(d x)

= E[(partpartd)r(Dg ZgUg bg) | Dg = d Xg = x

]is the covariate-conditioned average marginal effect of Dj on Yj at (Dj = d X j =x) for j isin Jg g isin G Thus when the other two terms vanish the regressionderivative identifies the structurally meaningful weighted effect β(d x) andwhen β(d x) is nonzero for (d x) in a set of positive probability we haveGminuscausality in mean of D for Y wrt X When the other two terms do notvanish we generally have identification failure and there is no necessary linkbetween Gminuscausality and structural causality

The second term is a pure endogeneity bias

δ1(d x) =sumgisinG

pg qg(d x) E(εg Sg | Dg = d Xg = x

)

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

280 XUN LU ET AL

where εg = Ygminus E(Yg | Dg = d Xg = x

)is the regression residual for the given

group and Sg = (partpartd) ln fg(Ug | Dg Xg) is the exogeneity score of White andChalak (2013) Observe that E(εg Sg | Dg = d Xg = x) represents a specificform of omitted variable bias where the exogeneity score is the omitted variableA sufficient (but not necessary) condition for this bias to vanish is A2(i i) Dg perpUg | Xg for all g isin G

The third term is a pure heterogeneity bias

δ2(d x) =sumgisinG

pgpart

partdqg(d x) E

(Yg | Dg = d Xg = x

)

due to heterogeneity of fg(d | x) across members of G8 A sufficient (but notnecessary) condition for this to vanish is A2(i) Dg | Xg sim D | X for all g isin G

Thus A2 ensures that the regression derivative identifies the weighted effect

β(d x) = βlowast(d x) =sumgisinG

pg qlowastg(x)βlowast

g(d x)

where qlowastg(x) = fg(x)

sumh (ph fh(x)) and βlowast

g(d x) = int(partpartd)r(d zu bg)

d Fg(u | x) This identification provides the link between Gminuscausality in meanand structural causality

Although A2 is not necessary for effect identification cases where identifica-tion holds in the absence of A2 are quite special analogous to failures of reg-ularity Thus for practical purposes A2 can be viewed as playing the key rolein identifying effects of interest and thereby linking Gminuscausality and structuralcausality

Further as Proposition 61 and Corollary 62 of WL show in the absence ofstructural causality Gminuscausality is essentially equivalent to exogeneity failureHere this is reflected in the fact that when βg(d x) vanishes for all g A1 andmild regularity conditions give

part

partdE (Y | D = d X = x) = δ1(d x)+ δ2(d x)

Thus with structural noncausality as Gminuscausality in mean implies the fail-ure of A2(i) A2(i i) or both conversely failure of A2 essentially ensuresGminuscausality in mean

36 Testing for Gminuscausality and structural causality in cross-sectiondata

As just seen the hypothesis of Gminusnon-causality and thus of structural noncausal-ity under the conditional exogeneity assumption is a specific conditional inde-pendence In the literature there are many conditional independence tests thatapply to IID data (eg see Delgado and Gonzalez-Manteiga 2001 Fernandes and

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 281

Flores 2001 Su and White 2007 2008 2014 Song 2009 Linton and Gozalo2014 Huang Sun and White 2016) These methods can be computationallychallenging as they are nonparametric Angrist and Kuersteiner (2004 2011)develop a semi-parametric test for conditional independence Based on the factthat Y perp D | X implies ψ1(Y ) perp ψ2(D) | X for any measurable vector functions(ψ1ψ2) WL propose several convenient regression-based tests

4 STRUCTURAL CAUSALITY AND GminusCAUSALITY IN PANEL DATAThe literature contains considerable discussion about Gminuscausality in the paneldata setting Nevertheless the focus is mainly on linear or parametric modelsFor example Chamberlain (1984) discusses Gminuscausality ldquoconditional on un-observables [lsquofixed effectrsquo]rdquo in a linear model and a logit model Holtz-EakinNewey and Rosen (1988) consider a traditional linear vector autoregression(VAR) and use GMM to test Gminuscausality Nair-Reichert and Weinhold (2001)discuss Gminuscausality in a dynamic mixed fixed- and random-coefficients linearmodel allowing heterogeneity across individuals Dumitrescu and Hurlin (2012)test for Gminuscausality in a linear dynamic panel model with fixed coefficients thatvary across individuals There is also a growing literature on nonlinear and non-separable panel models (see eg Hoderlein and White (2012) and the referencestherein) We are not aware that Gminuscausality has been discussed in such modelsso far

We emphasize that in panel data Gminuscausality is also entirely based on pre-dictability and has no causal interpretation In this section we link Gminuscausalityand structural causality in general panel structures Since the static panel case issimilar to the cross-sectional case we focus on a dynamic data generating processTo simplify notation for given y isin N+ we let Y jtminus1 denote the finite historyY jtminus1 = (Yjtminusy Yjtminus1)

Assumption B1 (Panel Population DGP) Let (F P) be a complete prob-ability space let N isin N+ and let y isin N+ For the members of a populationj isin 1 N let the random vectors Yjt t = 12 be structurally determinedby a triangular system as

Yjt = r(Y jtminus1 Djt Zjt Ujt bjt

) (41)

where r is an unknown measurable ky times 1 function ky isin N+ Yjτ (τ = 1 minusy 0) Djt (kd times1kd isin N+) Zjt (kz times1 kz isin N) and Ujt (ku times1 ku isin N)are vectors of nondegenerate random variables on (F P) and bjt is a non-random real vector of dimension kb isin N Suppose also that Wjt t = 12 arerandom vectors on (F P) with dimension kw isin N The triangular structureis such that for t = 12 neither Yjsinfins=1minusy

nor Djsinfins=1 structurally deter-mine Wjt Yjsinfins=1minusy

does not structurally determine Djt Zjt or Ujt andDjsinfins=1 does not structurally determine Zjt or Ujt

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

282 XUN LU ET AL

The interpretation of structural eq(41) is the same as that for eq(31) ex-cept that here we have multiple time periods t = 12 The elements of(Yjt Djt Zjt Wjt Ujt

)and bjt rsquos can contain both time-varying and time-

invariant elements For now we do not distinguish these The elements of(Djt Zjt Wjt Ujt ) can contain lags of underlying variables By permittingonly finite histories Y jtminus1 Djt Zjt we restrict attention to Markov-type datagenerating processes this greatly simplifies the analysis and corresponds to thestructures usually considered in practice

As in the cross-section case we can have both structural heterogeneity anddistributional heterogeneity We specify the relevant structural and distributionalheterogeneities below

We again impose random sampling from the population For simplicity weassume all time periods can be observed for every individual so we observe abalanced panel Similar to the cross-section case we write

Yt = r (Y tminus1 Dt Zt Ut Bt ) t = 1 T

by suppressing the cross-sectional subscript i We assume that we observe data(Yt Dt Zt Wt ) t = 1 T and relevant lags prior to t = 1

41 Linking structural causality and Gminuscausality in dynamic panelsWe first define structural causality for the dynamic panel structure Let Djt be thesupport of Djt

DEFINITION 41 (Structural Causality in Dynamic Panels) Let j and t begiven If the function r(y jtminus1 middot zjt ujt bjt ) Djt rarr R

ky is constant on Djt

for all admissible y jtminus1 zjt and ujt then Djt does not structurally cause Yjt and we write Djt rArrS Yjt Otherwise Djt structurally causes Yjt and we writeDjt rArrS Yjt

We let X jt = (Z primejt W prime

jt )prime denote the covariates at time t WL consider

retrospective conditional exogeneity and retrospective Gminuscausality (White andKennedy 2009) We also allow this For this we letX jt denote a covariate his-tory that may contain X jt and lags or leads of X jt as in Wooldridge (2005 p 41)or WL ie for some m isin N+

X jt = SX(X jtminusm X jt X jt+m

) t = 12

where SX is a given selection matrix We assume

X jt X jtminusm subeX jt and

allow X jt also to contain leads of X jt When Xjt contains leads we have theretrospective case as in WL We denote randomly sampled covariates X t Forsimplicity we also assume that all needed elements of X t are observable Tem-poral precedence plays an important role in panel data In particular comparedwith the cross-sectional case we now have a richer set of covariates (includingboth leads and lags) to choose from

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 283

To provide a link between structural causality and Gminuscausality for dynamicpanels we condition on not only covariates X t but also lags of Yt For thiswe denote Y jtminus1 = (Yjtminusmy Yjtminus1) where my isin N+ and my ge y ie

Y jtminus1 sube Y jtminus1 Similarly we let Y tminus1 be the randomly sampled Y jtminus1Similar to the cross-section case for each t we define population group Jgt

g isin 1 t as a collection of population units having identical bjt and iden-tical distribution of (Ujt Djt ) |X jt Y jtminus1 We also let Gt = 1 t Forgiven t we write Dgt rArrS Ygt to denote Djt rArrS Yjt for all j in group Jgt g isin Gt

Next we impose an exogeneity assumption analogous to A2

Assumption B2 (Heterogeneous Dynamic Panel Exogeneity) Let t be givenFor all g isin Gt (i) Dgt |Xgt Y gtminus1 sim Dt |X t Y tminus1 and (i i) Dgt perp Ugt |Xgt Y gtminus1

Here as before for all g isin Gt the conditional distributions of Dgt given(Xgt Y gtminus1) are assumed to be identical and we write the common distributionas Dt |X t Y tminus1

The analog of Theorem 31 relating population and sample conditional exo-geneity is

THEOREM 41 Suppose B1ndashB2(i) hold and let t be given If B2(i i) holdsthen Dt perp Ut |X t Y tminus1

Proposition 42 provides a link between structural noncausality and Gminusnon-causality

PROPOSITION 42 Suppose that B1 and B2 hold Let t be given Thenstructural noncausality (Dgt rArrS Ygt forallg isin Gt ) implies that Dt does notGminuscause Yt wrtX t Y tminus1 ie Dt perp Yt |X t Y tminus1

We can also define structural causality with positive probability for dynamicstructures and show that structural causality and Gminuscausality are ldquoessentiallyrdquoequivalent under the key conditional exogeneity assumption For brevity we omitthe details

So far we have not carefully distinguished between time-varying and time-invariant elements of unobservable bjt and Ujt Below we briefly discuss thecase where we allow time-invariant components of Ujt and the randomly sampledbjt to be arbitrarily dependent or correlated with the cause of interest Djt For this we denote Ujt equiv (Ujt Uj0) where Ujt is time varying and Uj0time-invariant Similarly we denote bjt equiv (bjt bj0) The randomly sampledUjt Uj0 bjt bj0 are denoted as Ut U0 Bt B0 respectively In princi-ple we want to remove the time-invariant Uj0 and bj0 using eg the firstdifference For this it is convenient to impose a separable assumption on thestructural equation9 We also impose a conditional exogeneity assumption at thesample level

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

284 XUN LU ET AL

Assumption B3 Suppose that B1 holds with

Yjt = r1(Djt Uj0 bj0

)+r2(Y jtminus1 Djt Zjt Ujt bjt

) j = 1 N t = 1

where r1 and r2 are two unknown ky times1 measurable functions

Assumption B4 Suppose that for given t Dt perp (Ut Utminus1 Bt Btminus1

) |Xt Y tminus1

B4 is a sample-level assumption which can be supported by a correspond-ing assumption at the heterogeneous population level Note that in general B4is a strong assumption as we condition on Y tminus1 which is a function of bothDtminus1 and Utminus1 Nevertheless this assumption is plausible under the null of struc-tural noncausality as Y jtminus1 is a constant function of Djtminus1 under the nullOne simple example is that bjt rsquos are constants over j and t and Djt infint=1 perpYjt 0

t=1minusyUjt infint=1X jt infint=1 Under the null of structural noncausality for

all t Y jtminus1 is a function of Yjs0s=1minusy

Ujsts=1X jst

s=1 Then it is easyto show that in this case B4 is satisfied Certainly it will be interesting to relaxB4 but we leave this for future research

PROPOSITION 43 Suppose that B1ndashB4 hold Suppose that Dgt rArrS Ygtand Dgtminus1 rArrS Ygtminus1 forallg isin Gt cupGtminus1 Then Dt perp Yt | Xt Y tminus1

Proposition 43 suggests that when there are time-invariant components in theunobservables we can test for structural noncausality by testing for Gminusnon-causality Dt perp Yt | Xt Y tminus1 under the conditional exogeneity assumption B4

42 Testing for Gminuscausality and structural causality in panel dataAs shown above we can perform a Gminusnon causality test to test for structuralnoncausality Testing Gminusnon-causality in panel data is also simply a conditionalindependence test Here we focus on testing Dt perp Yt | Xt Y tminus1

First assume that the joint distributions of (Dt Yt Xt ) are identical over t In this case we can pool the data and implement a conditional independence testas discussed in Section 36

Second suppose that the joint distributions of (Dt Yt Xt ) are different overtime t In this case for each time period t using the cross-section data we canimplement a conditional independence test for Dt perp Yt | (Xt Y tminus1) and obtain atest statistic sayWt Thus we have T test statistics W1W2 WT and eachone can be used to test for structural noncausality for each t We may also want totest the hypothesis that for all t Djt does not structurally cause Yjt For this wecan construct our test statistic by taking the average of the T test statistics ieW equiv 1

T

sumTt=1Wt as in Im Pesaran and Shin (2003) and Dumitrescu and Hurlin

(2012) Other forms of ldquoaveragesrdquo are possible and special care is needed to takeinto account the dependence structure over time

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 285

5 CONCLUSION

This paper provides direct links between Granger causality and structural causal-ity in cross-section and panel data We extend Granger causality to cross-section and panel data and give a natural definition of structural causality inheterogeneous populations We show that under the key conditional exogeneityassumption Granger causality is essentially equivalent to structural causality incross-section and panel data Similar to the results in White and Lu (2010) fortime-series data our results here should enable researchers to avoid the misuse ofGranger causality and to establish the desired structural causal relation

NOTES

1 Strictly speaking if the conditional distributions are parameterized or treated as infinite-dimensional parameters then distributional heterogeneity may also be thought of as a special caseof structural heterogeneity

2 The classification is certainly overly simplistic We ignore the early literature on the philosophyof causality (for a review see eg Holland 1986 and Hoover 2008) We also ignore the litera-ture on causality discussed in other disciplines notably in machine learning (see eg Pearl 20092015 White and Chalak 2009 and White Chalak and Lu 2011) In particular Heckman and Pinto(2015) compare Haavelmorsquos structural framework of causality with the framework of Directed AcyclicGraphs (DAG) studied in the Bayesian network

3 Although Uj is typically called ldquounobservablerdquo it is better to view its elements as variables thatwill be omitted from empirical analysis This may be because they are unobservable it may also bebecause the researcher has purposefully or inadvertantly neglected them

4 For notational simplicity we suppress the observation subscript i5 Using our notation Holland considers the case where D perp (XU ) which implies our condi-

tional exogeneity assumption D perp U | X below6 See eg Altonji and Matzkin (2005) Hoderlein and Mammen (2007 2009) Imbens and Newey

(2009) White and Lu (2011) and White and Chalak (2013)7 AT T = E(Y (1)minusY (0) | D = 1) = E(U minus0 | D = 1) = E(U ) = 0

8 The conditional density fg(d | x) can also be interpreted as Imbensrsquos (2000) generalized propen-sity score

9 There is a recent literature which allows time-invariant unobservables to be arbitrarily correlatedwith causes of interest in the general nonseparable models (see eg Evdokimov 2010) We leave theproblem of linking G-causality and structural causality in this general setting for future research

REFERENCES

Altonji J amp R Matzkin (2005) Cross section and panel data estimators for nonseparable models withendogenous regressors Econometrica 73 1053ndash1102

Angrist J K Graddy amp G Imbens (2000) The interpretation of instrumental variables estimatorsin simultaneous equations models with an application to the demand for fish Review of EconomicStudies 67 499ndash527

Angrist J O Jorda amp G Kuersteiner (2013) Semiparametric estimates of monetary policy effectsstring theory revisited NBER Working Paper 19355

Angrist J amp G Kuersteiner (2004) Semiparametric causality tests using the policy propensity scoreNBER Working Paper No 10975

Angrist J amp G Kuersteiner (2011) Causal effects of monetary shocks semiparametric conditionalindependence tests with a multinomial propensity score Review of Economics and Statistics 93725ndash747

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

286 XUN LU ET AL

Angrist J amp J Pischke (2009) Mostly Harmless Econometrics An Empiricistrsquos Companion PrincetonUniversity Press

Bester CA amp CB Hansen (2016) Grouped effects estimators in fixed effects models Journal ofEconometrics 190 197ndash208

Bonhomme S amp E Manresa (2015) Grouped patterns of heterogeneity in panel data Econometrica83 1147ndash1184

Browning M amp JM Carro (2007) Heterogeneity and microeconometrics modelling In R BlundellW K Newey and T Persson (Eds) Advances in Economics and Econometrics Theory andApplications Ninth World Congress of the Econometric Society Volume 3 pp 45ndash74 CambridgeUniversity Press

Browning M amp JM Carro (2010) Heterogeneity in dynamic discrete choice models EconometricsJournal 13 1ndash39

Burda M M Harding amp J Hausman (2015) A Bayesian semi-parametric competing risk model withunobserved heterogeneity Journal of Applied Econometrics 30 353ndash376

Chalak K amp H White (2011) An extended class of instrumental variables for the estimation of causaleffects Canadian Journal of Economics 44 1ndash51

Chalak K amp H White (2012) Causality conditional independence and graphical separation insettable systems Neural Computation 24 1611ndash1668

Chamberlain G (1982) The general equivalence of Granger and Sims causality Econometrica 50569ndash581

Chamberlain G (1984) Panel data In Z Griliches amp M Intriligator (eds) Handbook of Economet-rics vol2 ch 22 pp1247ndash1318 Elsevier

Chesher A (2003) Identification in nonseparable models Econometrica 71 1405ndash1441Chesher A (2005) Nonparametric identification under discrete variation Econometrica 73

1525ndash1550Dawid AP (1979) Conditional independence in statistical theory Journal of the Royal Statistical

Society Series B 41 1ndash31Deb P amp PK Trivedi (2013) Finite mixture for panels with fixed effects Journal of Econometric

Methods 2 35ndash51Delgado M amp W Gonzalez-Manteiga (2001) Significance testing in nonparametric regression based

on the bootstrap Annals of Statistics 29 1469ndash1507Dumitrescu E-I amp C Hurlin (2012) Testing for Granger non-causality in heterogeneous panels

Economic Modelling 29 1450ndash1460Evdokimov K (2010) Identification and estimation of a nonparametric panel data model with unob-

served heterogeneity Working paper Department of Economics Princeton UniversityFlorens JP (2003) Some technical issues in defining causality Journal of Econometrics 112

127ndash128Florens JP amp D Fougere (1996) Non-causality in continuous time Econometrica 64 1195ndash1212Florens JP amp M Mouchart (1982) A note on non-causality Econometrica 50 583ndash591Granger CWJ (1969) Investigating causal relations by econometric models and cross-spectral

methods Econometrica 37 424ndash438Granger CWJ (1980) Testing for causality a personal viewpoint Journal of Economic Dynamics

and Control 2 329ndash352Granger CWJ (1988) Some recent developments in a concept of causality Journal of Econometrics

39 199ndash211Granger CWJ amp P Newbold (1986) Forecasting Economic Time Series Academic Press 2nd

editionHausman J amp T Woutersen (2014) Estimating a semi-parametric duration model without specifying

heterogeneity Journal of Econometrics 178 114ndash131Haavelmo T (1943) The statistical implications of a system of simultaneous equations Econometrica

11 1ndash12Haavelmo T (1944) The probability approach in econometrics Econometrica 12 (supp) iii-vi

1ndash115

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 287

Hahn J (1998) On the role of the propensity score in efficient semiparametric estimation of averagetreatment effect Econometrica 66 315ndash331

Hahn J amp R Moon (2010) Panel data models with finite number of multiple equilibria EconometricTheory 26 863ndash881

Heckman JJ (2000) Causal parameters and policy analysis in economics a twentieth century retro-spective Quarterly Journal of Economics 115 45ndash97

Heckman JJ (2008) Econometric causality International Statistical Review 76 1ndash27Heckman JJ amp R Pinto (2015) Causal analysis after Haavelmo Econometric Theory 31

115ndash151Hirano K G Imbens amp G Ridder (2003) Efficient estimation of average treatment effects using the

estimated propensity score Econometrica 71 1161ndash1189Hoderlein S amp E Mammen (2007) Identification of marginal effects in nonseparable models without

monotonicity Econometrica 75 1513ndash1519Hoderlein S amp E Mammen (2009) Identification and estimation of local average derivatives in non-

separable models without monotonicity Econometrics Journal 12 1ndash25Hoderlein S amp H White (2012) Nonparametric identification in nonseparable panel data models with

generalized fixed effects Journal of Econometrics 168 300ndash314Holland P (1986) Statistics and causal inference Journal of the American Statistical Association 81

945ndash970Holtz-Eakin D W Newey amp HS Rosen (1988) Estimating vector autoregressions with panel data

Econometrica 56 1371ndash1396Hoover K (2008) Causality in economics and econometrics In S Durlauf and L Blume (eds)

The New Palgrave Dictionary of Economics 2nd edition Palgrave MacmillanHsiao C (2014) Analysis of Panel Data Cambridge University PressHuang M Y Sun amp H White (2016) A flexible nonparametric test for conditional independence

Econometric Theory forthcomingHurwicz L (1950) Generalization of the concept of identification In TC Koopmans (eds) Statistical

Inference in Dynamic Economic Models pp 238ndash257 John WileyIm K H Pesaran amp Y Shin (2003) Testing for unit roots in heterogeneous panels Journal of Econo-

metrics 115 53ndash74Imbens G (2000) The role of the propensity score in estimating dose-response functions Biometrika

87 706ndash710Imbens G (2004) Nonparametric estimation of average treatment effects under exogeneity a review

Review of Economics and Statistics 86 4ndash29Imbens G amp W Newey (2009) Identification and estimation of triangular simultaneous equations

models without additivity Econometrica 77 1481ndash1512Imbens G amp J Wooldridge (2009) Recent developments in the econometrics of program evaluation

Journal of Economic Literature 47 5ndash86Kasy M (2011) Identification in triangular systems using control functions Econometric Theory 27

663ndash671Koopmans T (1950) Statistical Inference in Dynamic Economic Models Cowles Commission Mono-

graph No 10 WileyKuersteiner G (2008) GrangerndashSims causality In S Durlauf and L Blume (Eds) The New Palgrave

Dictionary of Economics 2nd edition Palgrave MacmillanLechner M (2001) Identification and estimation of causal effect of multiple treatments under con-

ditional independence assumption In M Lechner F Pfeiller (eds) Econometric Evaluations ofLabour Market Policies Hildelberg Physical 43ndash58

Lechner M (2011) The relation of different concepts of causality used in time series and microecono-metrics Econometric Reviews 30 109ndash127

Lin C-C amp S Ng (2012) Estimation of panel data models with parameter heterogeneity when groupmembership is unknown Journal of Econometric Methods 1 42ndash55

Lindsay B (1995) Mixture Models Theory Geometry and Applications Hayward Institute for Math-ematical Statistics

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

288 XUN LU ET AL

Linton O amp P Gozalo (2014) Testing conditional independence restrictions Econometric Reviews33 523ndash552

Lu X amp L Su (2014) Determining the number of groups in latent panel structures Working paperSchool of Economics Singapore Management University

Matzkin R (2003) Nonparametric estimation of nonadditive random functions Econometrica 711339ndash1375

Matzkin R (2007) Nonparametric identification In JJ Heckman and EE Leamer (eds) The Hand-book of Econometrics 6B North-Holland

Mclachlan G amp K Basford (1988) Mixture Models Inference and Applications to Clustering MarcelDekker

Nair-Reichert U amp D Weinhold (2001) Causality tests for cross-country panels a new look at FDIand economic growth in developing countries Oxford Bulletin of Economics and Statistics 63153-171

Pearl J (2009) Causality Models Reasoning and Inference (2nd edition) Cambridge UniversityPress

Pearl J (2015) Trygve Haavelmo and the emergence of causal calculus Econometric Theory 31152ndash179

Rosenbaum P amp D Rubin (1983) The central role of the propensity score in observational studies forcausal effects Biometrika 70 41ndash55

Rubin D (1974) Estimating causal effects of treatments in randomized and non-randomized studiesJournal of Educational Psychology 66 688ndash701

Rubin D (2004) Direct and indirect causal effects via potential outcomes Scandinavian Journal ofStatistics 31 161ndash170

Sarafidis V amp N Weber (2015) A partially heterogeneous framework for analyzing panel data OxfordBulletin of Economics and Statistics 77 274ndash296

Sims C (1972) Money income and causality American Economic Review 62 540ndash552Song K (2009) Testing conditional independence via Rosenblatt transforms Annals of Statistics 37

4011ndash4015Strotz R amp H Wold (1960) Recursive vs nonrecursive systems An attempt at synthesis Economet-

rica 28 417ndash427Su L Z Shi amp PCB Phillips (2014) Identifying latent structures in panel data Working paper

Department of Economics Yale UniversitySu L amp H White (2007) A Consistent characteristic function-based test for conditional indepen-

dence Journal of Econometrics 141 807ndash834Su L amp H White (2008) A nonparametric Hellinger metric test for conditional independence Econo-

metric Theory 24 829ndash864Su L amp H White (2014) Testing conditional independence via empirical likelihood Journal of

Econometrics 182 27ndash44Sun Y (2005) Estimation and inference in panel structure models Working paper Department of

Economics UCSDWhite H amp K Chalak (2009) Settable systems An extension of Pearlrsquos causal model with optimiza-

tion equilibrium and learning Journal of Machine Learning Research 10 1759ndash1799White H amp K Chalak (2010) Testing a conditional form of exogeneity Economics Letters 109

88ndash90White H amp K Chalak (2013) Identification and identification failure for treatment effects using

structural systems Econometric Reviews 32 273ndash317White H K Chalak amp X Lu (2011) Linking Granger causality and the Pearl causal model with

settable systems Journal of Machine Learning Research Workshop and Conference Proceedings12 1ndash29

White H amp P Kennedy (2009) Retrospective estimation of causal effects through time In J Castleamp N Shephard (eds) The Methodology and Practice of Econometrics A Festschrift in Honour ofDavid F Hendry pp 59ndash87 Oxford University Press

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 289

White H amp X Lu (2010) Granger causality and dynamic structural systems Journal of FinancialEconometrics 8 193ndash243

White H amp X Lu (2011) Causal diagrams for treatment effect estimation with application to selectionof efficient covariates Review of Economics and Statistics 93 1453ndash1459

Wooldridge J (2005) Simple solutions to the initial conditions problem for dynamic nonlinear paneldata models with unobserved heterogeneity Journal of Applied Econometrics 20 39ndash54

Zellner A (1979) Causality and econometrics In K Brunner and AH Meltzer (eds) Three Aspectsof Policy and Policymaking Carnegie-Rochester Conference Series Vol 10 North-Holland 9ndash54

APPENDIX

Proof of Theorem 31 To show D perp U | X we establish that f (u | d x) = f (u | x)for all (ud x) We have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

= f (u x)

f (x)= f (u | x)

where the first equality in the second line holds since A2(i) (ie Dg | Xg sim D | X)implies fg(d x) = f (d | x) fg(x) say and the second holds by A2(i i) fg(u | d x) =fg(u | x) n

Proof of Theorem 32 Note that

f (u | x) = f (u x)

f (x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

We also have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | d x) fg(x)sum

gisinG pg fg(x)

The first equality in the second line follows from A2(i) Then f (u | d x) = f (u | x) forall (ud x) if and only ifsumgisinG

pg fg(u | d x) fg(x) =sumgisinG

pg fg(u | x) fg(x) for all (ud x)

By assumption (Ug | Dg Xg) g isinG is regular It follows immediately that for all g isinGDg perp Ug | Xg n

Proof of Proposition 33 Take any g isin G and let j belong to group Jg Dg rArrSYg means that Yj = r(Zj Uj bj ) Thus Uj perp Dj | X j implies that Yj perp Dj | X j byDawid (1979 Lemmas 41 and 42) This holds for all g isin G Applying Theorem 31gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

290 XUN LU ET AL

Proof of Theorem 34 We prove this by showing that Y perp D | X implies that Dg rArrSYg for all g isinG Given the assumed regularity Theorem 32 ensures that Y perp D | X impliesthat Yg perp Dg | Xg for all g isin G

Given (i) eq(33) and A2(ii) imply that for all j in group JgE

(Yj |X j = x Dj = d

) = r1(d z bj

)+ E[r2

(zUj bj

) | Dj = d X j = x]

= r1(d z bj

)+ E[r2

(zUj bj

) | X j = x]

Clearly r1(d z bj

)is constant in d if and only if E

(Yj |Dj = d X j = x

)is constant in

d Yj perp Dj | X j implies that E(Yj |Dj = d X j = x

)is constant in d which then implies

that r1(d z bj

)is constant in d Thus Dj rArrS Yj so Dg rArrS Yg

Given (i i) let ky = 1 without loss of generality Then by A2(ii) for all j in group JgFUj (u | x) = E

[1Uj le u

∣∣ X j = x]

= E[

1r(d zUj bj

) le r(d zu bj

)∣∣ X j = x]

= E[

1r(Dj Zj Uj bj

) le r(d zu bj

)∣∣ Dj = d X j = x]

= E[

1Yj le r

(d zu bj

)∣∣ Dj = d X j = x]

= Fj(r(d zu bj

) | d x)

By strict monotonicity of Fj (y | d x) in y

r(d zu bj

) = Fminus1j

(FUj (u | x) | d x

)

Now Yj perp Dj | X j implies that Fminus1j

(FUj (u | x) | d x

)is constant in d so r

(d zu bj

)is

also a constant function in d ie Dj rArrS Yj so Dg rArrS Yg As either (i) or (i i) holds for each g isin G we have Dg rArrS Yg for all g isin G n

Proof of Theorem 35 To show that A1 - A2 and the assumed regularity for Ygtogether with Dg rArrS(Xg) Yg for some g isin G imply D perp Y | X we let j in group Jgg isin G be such that Dj rArrS(X j ) Yj A2(i i) ensures that Dj perp Uj | X j so

P[Yj le y | Dj = d X j = x] =int

1r(d zu bj ) le y

d Fj (u | d x)

=int

1r(d zu bj ) le y

d Fj (u | x)

By assumption Dj rArrS(X j ) Yj so there exists ylowast isin supp(Yj ) such that there is no mea-

surable mapping f jylowast for whichint

1r(d zu bj ) le ylowastd Fj (u | d x) = f jylowast(x) ae-x Specifically this rules out the possibility thatint

1r(d zu bj ) le ylowast

d Fj (u | X j ) = f jylowast(X j ) equiv P[Yj le ylowast | X j ] as

Thus there exists ylowast isin supp(Yj ) such that

P[Yj le ylowast | Dj = d X j = x

] = P[Yj le ylowast | X j = x

]

for x in a set of positive probability so Yj perp Dj | X j Given the regularity assumed for YgTheorem 32 (for Yg) gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 291

Proof of Theorem 41 The proof is analogous to that of Theorem 31 n

Proof of Proposition 42 The proof is analogous to that of Proposition 33 n

Proof of Proposition 43 Under B3 Dgt rArrS Ygt and Dgtminus1 rArrS Ygtminus1 forallg isinGt cupGtminus1 imply that for j isin Jg there exist two measurable functions r1 and r2 such that

Yjt = r1(Uj0 bj0)+ r2(Y jtminus1 Zjt Ujt bjt

) j = 1 N t = 1

Hence

Yjt minusYjtminus1 = r2(Y jtminus1 Zjt Ujt bjt

)minus r2(Y jtminus2 Zjtminus1Ujtminus1 bjtminus1

)

At the sample level this means that

Yt minusYtminus1 = r2(Y tminus1 Zt Ut Bt

)minus r2(Y tminus2 Ztminus1Utminus1 Btminus1

)

Then B4 implies that

Dt perp (Xt Y tminus1Ut Utminus1 Bt Btminus1) | Xt Y tminus1

which further implies that Dt perp (Yt minusYtminus1) | Xt Y tminus1by Dawid (1979 Lemmas 41 and42) as (Yt minus Ytminus1) is a function of (Xt Y tminus1Ut Utminus1 Bt Btminus1) Then we have Dt perpYt | Xt Y tminus1 as Ytminus1 is a component of Y tminus1 n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

Page 14: 33 GRANGER CAUSALITY AND STRUCTURAL CAUSALITY IN … Lu_Su... · 2017. 2. 7. · 264 XUNLUETAL. natural experiments. The goal of this paper is to establish the analogous equiv-alence

276 XUN LU ET AL

DEFINITION 34 Ug | Dg Xg g isin G is regular ifsumgisinG

pg fg(u | d x) fg(x) =sumgisinG

pg fg(u | x) fg(x) for all (ud x) (32)

implies Ug perp Dg | Xg for all g isin GWhen G = 1 Ug | Dg Xg g isin G is necessarily regular To understand whatregularity rules out consider the next simplest case with G = 2 and Xg absentThen Ug | Dg g isin G fails to be regular if and only if U2 perp D2 (say) and

f1(u | d) = f1(u)minus 1minus p1

p1[ f2(u | d)minus f2(u)] for all (ud)

where the two groups have been re-indexed for convenience Suppose that p1f1(u) f2(u) and f2(u | d) are arbitrary Then it is easily arranged that f1(u | d)can be negative for (ud) in a set of positive probability so f1(u | d) does notdefine a conditional density and Ug | Dg g isin G is regular after all If this f1(u |d) is nevertheless a conditional density it is clearly highly special as it ensuresthat eq(32) holds for the given p1 and the functions defined by f1(u) f2(u) andf2(u | d) (all of which are typically unknown) but not necessarily otherwise

In the general case Ug | Dg Xg g isin G fails to be regular if and only ifUg perp Dg | Xg for some g isin G and

f1(u | d x) = f1(u | x)minus (p1 f1(x))minus1sumg=2

pg [ fg(u | d x) minus fg(u | x)] fg(x)

times for all (ud x)

where we again re-index the groups As before this is clearly a very special pop-ulation configuration ruling out such cases by imposing regularity is a weak re-striction

The converse result is

THEOREM 32 Suppose A1 - A2(i) hold and suppose Ug | Dg Xgg isin Gis regular If D perp U | X then A2(i i) holds

Theorems 31-32 also hold with Yg | Dg Xg g isinG regular D perp U | X replacedby D perp Y | X and A2(i i) replaced by Dg perp Yg | Xg g isin G This result plays acentral role in linking structural causality and Gminuscausality

34 Linking structural causality and GminuscausalityAs we have seen A1 imposes structures where causal effects are well definedA2 permits recovering versions of these As we now prove this gives structuralmeaning to Gminuscausality generally Recall that the bj rsquos are identical in group Jgthus the same structural causality relations hold for all j in a given group Wewrite Dg rArrS Yg when Dj rArrS Yj for (all) j in group Jg Our next result showsthat structural noncausality implies Gminusnon-causality

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 277

PROPOSITION 33 Let A1 - A2 hold Then structural noncausality(Dg rArrS Yg forallg isinG) implies that D does not Gminuscause Y wrt X ie D perp Y | X

This result is intuitive when structural effects are well defined and identifiableGminusnon-causality follows from structural noncausality Holland (1986 p 958)gives a similar result for the random experiment case Our Proposition 33 jus-tifies using tests of Gminusnoncausality to test structural noncausality if we rejectGminusnoncausality we must reject structural noncausality

As WL show structural causality does not imply Gminuscausality Without furtherassumptions the concepts are not equivalent Nevertheless if as is commonlyassumed in the literature (see WL for discussion) r obeys separability betweenobservable causes D and unobservable causes U or r obeys a specific form ofmonotonicity in U then with suitable regularity structural causality does implyGminuscausality We have

THEOREM 34 Given A1 - A2 suppose Yg | Dg Xgg isin G is regularSuppose further that for all g isin G either (i) or (i i) holds

(i) For all j isin Jg for unknown measurable functions r1 and r2

Yj = r1(Dj Zj bj

)+ r2(Zj Uj bj

) (33)

(ii) For all j isinJg for = 1 ky r(d zu bj ) = r0(d zu bj ) for scalaru where r0(d z middot bj ) is strictly monotone increasing for each admissi-ble (d z) and Fj(y | d x) = P[Yj le y | Dj = d X j = x] is strictlymonotone in y for each admissible (d x)

Then structural causality (for some g isin G Dj rArrS Yj for (all) j in Jg) impliesthat D Gminuscauses Y wrt X ie D perp Y | X

Kasy (2011) gives a discussion of structures satisfying (i i) Even without sep-arability or monotonicity there is an equivalence between Gminuscausality and astronger notion of structural causality that handles certain exceptional cases wherethe causal structure and the conditional distribution of (Uj Dj ) given X j inter-act in just the right way to hide the structural causality WL call this strongernotion structural causality with positive probability and provide discussions Welet Yj = supp(Yj ) and Xj = supp(X j )

DEFINITION 35 (Structural Causality with Positive Probability) SupposeA1 holds and let j be given Suppose that for each y isin Yj there exists a mea-surable function f jy Xj rarr [01] such thatint

1r(Dj Zj u bj

)lt y

d Fj

(u | X j

) = f jy(X j

)as (34)

where Fj (u | X j ) = E[1Uj le u | X j ] Then Dj does not structurally cause Yj

almost surely (as) wrt X j (Dj rArrS(X j ) Yj ) Otherwise Dj structurally causesYj with positive probability (wpp) wrt X j

(Dj rArrS(X j ) Yj

)

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

278 XUN LU ET AL

By the definition of a group the same structural causality wpp relations holdfor all j in a given group We thus write Dg rArrS(Xg) Yg when Dj rArrS(X j ) Yj holdsfor all j in group Jg

Below is an example in WL in which Dj structurally causes Yj while Dj doesnot structurally cause Yj as Consider the structural equation r

Yj = r(Dj Uj

) = DjradicD2

j +1Uj1 + 1radic

D2j +1

Uj2

where Uj equiv (Uj1Uj2) and Dj are all N (01) random variables and Dj U1 j andU2 j are mutually independent For simplicity there is no bj Zj or X j It is clearthat Dj structurally causes Yj here Nevertheless Dj does not structurally causeYj as To see this note that Yj and Dj are independent Thus the LHS of eq(34)becomesint

1r(Dj u1u2

)lt y

d Fj (u1u2)

=int

1r(Dj u1u2

)lt y

d Fj

(u1u2|Dj

) = Pr[Yj lt y|Dj

] = (y)

where Fj (middot) and Fj(middot|Dj

)denote the CDF of Uj and the conditional CDF of Uj

given Dj and is the standard normal CDF Thus the RHS does not depend onDj ie Dj does not structurally cause Yj as

Theorem 35 below gives a result linking Gminuscausality and structural causalityas

THEOREM 35 Suppose A1 - A2 hold Suppose also that Yg | Dg Xgg isinG is regular If Dj structurally causes Yj wpp wrt X j (for some g isin GDj rArrS(X j ) Yj for all j in Jg) then D Gminuscauses Y wrt X ie D perp Y | X

Together Proposition 33 and Theorem 35 establish that cross-sectionGminuscausality and structural causality are essentially equivalent under the key con-ditional exogeneity assumption We say ldquoessentiallyrdquo as Theorem 35 rules outthe two ways that Gminusnon-causality can mask structural causality The first arisesfrom subtle interactions between the causal structure and the conditional distribu-tion of (Uj Dj ) given X j The second arises under heterogeneity from delicatecancellations among the conditional distributions of Yj given Dj and X j acrossgroups These exceptional possibilities only trivially mitigate the power of testsfor Gminuscausality as tests for structural causality Errors of interpretation and in-ference need not result as long as these exceptions are recognized

35 Identification and identification failure

To gain further insight into the relation between Gminuscausality and effect identi-fication we now undertake a deeper analysis of Gminuscausality in mean For this

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 279

let qg(d x) = fg(d x)sumhisinG (ph fh(d x)) Under A1 the law of iterated ex-

pectations gives

E (Y | D = d X = x) =sumgisinG

pg qg(d x)

intr(d zu bg) d Fg(u | d x)

For concreteness consider identifying average marginal effects for a continu-ous treatment d To allow both discrete and continuous Ug let νg(u | d x) definea σminusfinite measure dominating that defined by Fg(u | d x) and let fg(u | d x)define the associated Radon-Nikodym density for Ug given (Dg Xg) Assum-ing differentiability and mild regularity derivations parallel to those of White andChalak (2013) give

part

partdE(Y | D = d X = x)

=sumgisinG

pg qg(d x)

intpart

partdr(d zu bg) fg(u | d x) dνg(u | d x)

+sumgisinG

pg qg(d x)

int[r(d zu bg)

part

partdln fg(u | d x)]

times fg(u | d x) dνg(u | d x)

+sumgisinG

pgpart

partdqg(d x)

intr(d zu bg) fg(u | d x)dνg(u | d x)

After some manipulation this regression derivative can be written as

part

partdE (Y | D = d X = x) = β(d x)+ δ1(d x)+ δ2(d x) (35)

The first of the three terms on the right is a weighted average marginal effect

β(d x) =sumgisinG

pg qg(d x)βg(d x) where βg(d x)

= E[(partpartd)r(Dg ZgUg bg) | Dg = d Xg = x

]is the covariate-conditioned average marginal effect of Dj on Yj at (Dj = d X j =x) for j isin Jg g isin G Thus when the other two terms vanish the regressionderivative identifies the structurally meaningful weighted effect β(d x) andwhen β(d x) is nonzero for (d x) in a set of positive probability we haveGminuscausality in mean of D for Y wrt X When the other two terms do notvanish we generally have identification failure and there is no necessary linkbetween Gminuscausality and structural causality

The second term is a pure endogeneity bias

δ1(d x) =sumgisinG

pg qg(d x) E(εg Sg | Dg = d Xg = x

)

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

280 XUN LU ET AL

where εg = Ygminus E(Yg | Dg = d Xg = x

)is the regression residual for the given

group and Sg = (partpartd) ln fg(Ug | Dg Xg) is the exogeneity score of White andChalak (2013) Observe that E(εg Sg | Dg = d Xg = x) represents a specificform of omitted variable bias where the exogeneity score is the omitted variableA sufficient (but not necessary) condition for this bias to vanish is A2(i i) Dg perpUg | Xg for all g isin G

The third term is a pure heterogeneity bias

δ2(d x) =sumgisinG

pgpart

partdqg(d x) E

(Yg | Dg = d Xg = x

)

due to heterogeneity of fg(d | x) across members of G8 A sufficient (but notnecessary) condition for this to vanish is A2(i) Dg | Xg sim D | X for all g isin G

Thus A2 ensures that the regression derivative identifies the weighted effect

β(d x) = βlowast(d x) =sumgisinG

pg qlowastg(x)βlowast

g(d x)

where qlowastg(x) = fg(x)

sumh (ph fh(x)) and βlowast

g(d x) = int(partpartd)r(d zu bg)

d Fg(u | x) This identification provides the link between Gminuscausality in meanand structural causality

Although A2 is not necessary for effect identification cases where identifica-tion holds in the absence of A2 are quite special analogous to failures of reg-ularity Thus for practical purposes A2 can be viewed as playing the key rolein identifying effects of interest and thereby linking Gminuscausality and structuralcausality

Further as Proposition 61 and Corollary 62 of WL show in the absence ofstructural causality Gminuscausality is essentially equivalent to exogeneity failureHere this is reflected in the fact that when βg(d x) vanishes for all g A1 andmild regularity conditions give

part

partdE (Y | D = d X = x) = δ1(d x)+ δ2(d x)

Thus with structural noncausality as Gminuscausality in mean implies the fail-ure of A2(i) A2(i i) or both conversely failure of A2 essentially ensuresGminuscausality in mean

36 Testing for Gminuscausality and structural causality in cross-sectiondata

As just seen the hypothesis of Gminusnon-causality and thus of structural noncausal-ity under the conditional exogeneity assumption is a specific conditional inde-pendence In the literature there are many conditional independence tests thatapply to IID data (eg see Delgado and Gonzalez-Manteiga 2001 Fernandes and

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 281

Flores 2001 Su and White 2007 2008 2014 Song 2009 Linton and Gozalo2014 Huang Sun and White 2016) These methods can be computationallychallenging as they are nonparametric Angrist and Kuersteiner (2004 2011)develop a semi-parametric test for conditional independence Based on the factthat Y perp D | X implies ψ1(Y ) perp ψ2(D) | X for any measurable vector functions(ψ1ψ2) WL propose several convenient regression-based tests

4 STRUCTURAL CAUSALITY AND GminusCAUSALITY IN PANEL DATAThe literature contains considerable discussion about Gminuscausality in the paneldata setting Nevertheless the focus is mainly on linear or parametric modelsFor example Chamberlain (1984) discusses Gminuscausality ldquoconditional on un-observables [lsquofixed effectrsquo]rdquo in a linear model and a logit model Holtz-EakinNewey and Rosen (1988) consider a traditional linear vector autoregression(VAR) and use GMM to test Gminuscausality Nair-Reichert and Weinhold (2001)discuss Gminuscausality in a dynamic mixed fixed- and random-coefficients linearmodel allowing heterogeneity across individuals Dumitrescu and Hurlin (2012)test for Gminuscausality in a linear dynamic panel model with fixed coefficients thatvary across individuals There is also a growing literature on nonlinear and non-separable panel models (see eg Hoderlein and White (2012) and the referencestherein) We are not aware that Gminuscausality has been discussed in such modelsso far

We emphasize that in panel data Gminuscausality is also entirely based on pre-dictability and has no causal interpretation In this section we link Gminuscausalityand structural causality in general panel structures Since the static panel case issimilar to the cross-sectional case we focus on a dynamic data generating processTo simplify notation for given y isin N+ we let Y jtminus1 denote the finite historyY jtminus1 = (Yjtminusy Yjtminus1)

Assumption B1 (Panel Population DGP) Let (F P) be a complete prob-ability space let N isin N+ and let y isin N+ For the members of a populationj isin 1 N let the random vectors Yjt t = 12 be structurally determinedby a triangular system as

Yjt = r(Y jtminus1 Djt Zjt Ujt bjt

) (41)

where r is an unknown measurable ky times 1 function ky isin N+ Yjτ (τ = 1 minusy 0) Djt (kd times1kd isin N+) Zjt (kz times1 kz isin N) and Ujt (ku times1 ku isin N)are vectors of nondegenerate random variables on (F P) and bjt is a non-random real vector of dimension kb isin N Suppose also that Wjt t = 12 arerandom vectors on (F P) with dimension kw isin N The triangular structureis such that for t = 12 neither Yjsinfins=1minusy

nor Djsinfins=1 structurally deter-mine Wjt Yjsinfins=1minusy

does not structurally determine Djt Zjt or Ujt andDjsinfins=1 does not structurally determine Zjt or Ujt

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

282 XUN LU ET AL

The interpretation of structural eq(41) is the same as that for eq(31) ex-cept that here we have multiple time periods t = 12 The elements of(Yjt Djt Zjt Wjt Ujt

)and bjt rsquos can contain both time-varying and time-

invariant elements For now we do not distinguish these The elements of(Djt Zjt Wjt Ujt ) can contain lags of underlying variables By permittingonly finite histories Y jtminus1 Djt Zjt we restrict attention to Markov-type datagenerating processes this greatly simplifies the analysis and corresponds to thestructures usually considered in practice

As in the cross-section case we can have both structural heterogeneity anddistributional heterogeneity We specify the relevant structural and distributionalheterogeneities below

We again impose random sampling from the population For simplicity weassume all time periods can be observed for every individual so we observe abalanced panel Similar to the cross-section case we write

Yt = r (Y tminus1 Dt Zt Ut Bt ) t = 1 T

by suppressing the cross-sectional subscript i We assume that we observe data(Yt Dt Zt Wt ) t = 1 T and relevant lags prior to t = 1

41 Linking structural causality and Gminuscausality in dynamic panelsWe first define structural causality for the dynamic panel structure Let Djt be thesupport of Djt

DEFINITION 41 (Structural Causality in Dynamic Panels) Let j and t begiven If the function r(y jtminus1 middot zjt ujt bjt ) Djt rarr R

ky is constant on Djt

for all admissible y jtminus1 zjt and ujt then Djt does not structurally cause Yjt and we write Djt rArrS Yjt Otherwise Djt structurally causes Yjt and we writeDjt rArrS Yjt

We let X jt = (Z primejt W prime

jt )prime denote the covariates at time t WL consider

retrospective conditional exogeneity and retrospective Gminuscausality (White andKennedy 2009) We also allow this For this we letX jt denote a covariate his-tory that may contain X jt and lags or leads of X jt as in Wooldridge (2005 p 41)or WL ie for some m isin N+

X jt = SX(X jtminusm X jt X jt+m

) t = 12

where SX is a given selection matrix We assume

X jt X jtminusm subeX jt and

allow X jt also to contain leads of X jt When Xjt contains leads we have theretrospective case as in WL We denote randomly sampled covariates X t Forsimplicity we also assume that all needed elements of X t are observable Tem-poral precedence plays an important role in panel data In particular comparedwith the cross-sectional case we now have a richer set of covariates (includingboth leads and lags) to choose from

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 283

To provide a link between structural causality and Gminuscausality for dynamicpanels we condition on not only covariates X t but also lags of Yt For thiswe denote Y jtminus1 = (Yjtminusmy Yjtminus1) where my isin N+ and my ge y ie

Y jtminus1 sube Y jtminus1 Similarly we let Y tminus1 be the randomly sampled Y jtminus1Similar to the cross-section case for each t we define population group Jgt

g isin 1 t as a collection of population units having identical bjt and iden-tical distribution of (Ujt Djt ) |X jt Y jtminus1 We also let Gt = 1 t Forgiven t we write Dgt rArrS Ygt to denote Djt rArrS Yjt for all j in group Jgt g isin Gt

Next we impose an exogeneity assumption analogous to A2

Assumption B2 (Heterogeneous Dynamic Panel Exogeneity) Let t be givenFor all g isin Gt (i) Dgt |Xgt Y gtminus1 sim Dt |X t Y tminus1 and (i i) Dgt perp Ugt |Xgt Y gtminus1

Here as before for all g isin Gt the conditional distributions of Dgt given(Xgt Y gtminus1) are assumed to be identical and we write the common distributionas Dt |X t Y tminus1

The analog of Theorem 31 relating population and sample conditional exo-geneity is

THEOREM 41 Suppose B1ndashB2(i) hold and let t be given If B2(i i) holdsthen Dt perp Ut |X t Y tminus1

Proposition 42 provides a link between structural noncausality and Gminusnon-causality

PROPOSITION 42 Suppose that B1 and B2 hold Let t be given Thenstructural noncausality (Dgt rArrS Ygt forallg isin Gt ) implies that Dt does notGminuscause Yt wrtX t Y tminus1 ie Dt perp Yt |X t Y tminus1

We can also define structural causality with positive probability for dynamicstructures and show that structural causality and Gminuscausality are ldquoessentiallyrdquoequivalent under the key conditional exogeneity assumption For brevity we omitthe details

So far we have not carefully distinguished between time-varying and time-invariant elements of unobservable bjt and Ujt Below we briefly discuss thecase where we allow time-invariant components of Ujt and the randomly sampledbjt to be arbitrarily dependent or correlated with the cause of interest Djt For this we denote Ujt equiv (Ujt Uj0) where Ujt is time varying and Uj0time-invariant Similarly we denote bjt equiv (bjt bj0) The randomly sampledUjt Uj0 bjt bj0 are denoted as Ut U0 Bt B0 respectively In princi-ple we want to remove the time-invariant Uj0 and bj0 using eg the firstdifference For this it is convenient to impose a separable assumption on thestructural equation9 We also impose a conditional exogeneity assumption at thesample level

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

284 XUN LU ET AL

Assumption B3 Suppose that B1 holds with

Yjt = r1(Djt Uj0 bj0

)+r2(Y jtminus1 Djt Zjt Ujt bjt

) j = 1 N t = 1

where r1 and r2 are two unknown ky times1 measurable functions

Assumption B4 Suppose that for given t Dt perp (Ut Utminus1 Bt Btminus1

) |Xt Y tminus1

B4 is a sample-level assumption which can be supported by a correspond-ing assumption at the heterogeneous population level Note that in general B4is a strong assumption as we condition on Y tminus1 which is a function of bothDtminus1 and Utminus1 Nevertheless this assumption is plausible under the null of struc-tural noncausality as Y jtminus1 is a constant function of Djtminus1 under the nullOne simple example is that bjt rsquos are constants over j and t and Djt infint=1 perpYjt 0

t=1minusyUjt infint=1X jt infint=1 Under the null of structural noncausality for

all t Y jtminus1 is a function of Yjs0s=1minusy

Ujsts=1X jst

s=1 Then it is easyto show that in this case B4 is satisfied Certainly it will be interesting to relaxB4 but we leave this for future research

PROPOSITION 43 Suppose that B1ndashB4 hold Suppose that Dgt rArrS Ygtand Dgtminus1 rArrS Ygtminus1 forallg isin Gt cupGtminus1 Then Dt perp Yt | Xt Y tminus1

Proposition 43 suggests that when there are time-invariant components in theunobservables we can test for structural noncausality by testing for Gminusnon-causality Dt perp Yt | Xt Y tminus1 under the conditional exogeneity assumption B4

42 Testing for Gminuscausality and structural causality in panel dataAs shown above we can perform a Gminusnon causality test to test for structuralnoncausality Testing Gminusnon-causality in panel data is also simply a conditionalindependence test Here we focus on testing Dt perp Yt | Xt Y tminus1

First assume that the joint distributions of (Dt Yt Xt ) are identical over t In this case we can pool the data and implement a conditional independence testas discussed in Section 36

Second suppose that the joint distributions of (Dt Yt Xt ) are different overtime t In this case for each time period t using the cross-section data we canimplement a conditional independence test for Dt perp Yt | (Xt Y tminus1) and obtain atest statistic sayWt Thus we have T test statistics W1W2 WT and eachone can be used to test for structural noncausality for each t We may also want totest the hypothesis that for all t Djt does not structurally cause Yjt For this wecan construct our test statistic by taking the average of the T test statistics ieW equiv 1

T

sumTt=1Wt as in Im Pesaran and Shin (2003) and Dumitrescu and Hurlin

(2012) Other forms of ldquoaveragesrdquo are possible and special care is needed to takeinto account the dependence structure over time

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 285

5 CONCLUSION

This paper provides direct links between Granger causality and structural causal-ity in cross-section and panel data We extend Granger causality to cross-section and panel data and give a natural definition of structural causality inheterogeneous populations We show that under the key conditional exogeneityassumption Granger causality is essentially equivalent to structural causality incross-section and panel data Similar to the results in White and Lu (2010) fortime-series data our results here should enable researchers to avoid the misuse ofGranger causality and to establish the desired structural causal relation

NOTES

1 Strictly speaking if the conditional distributions are parameterized or treated as infinite-dimensional parameters then distributional heterogeneity may also be thought of as a special caseof structural heterogeneity

2 The classification is certainly overly simplistic We ignore the early literature on the philosophyof causality (for a review see eg Holland 1986 and Hoover 2008) We also ignore the litera-ture on causality discussed in other disciplines notably in machine learning (see eg Pearl 20092015 White and Chalak 2009 and White Chalak and Lu 2011) In particular Heckman and Pinto(2015) compare Haavelmorsquos structural framework of causality with the framework of Directed AcyclicGraphs (DAG) studied in the Bayesian network

3 Although Uj is typically called ldquounobservablerdquo it is better to view its elements as variables thatwill be omitted from empirical analysis This may be because they are unobservable it may also bebecause the researcher has purposefully or inadvertantly neglected them

4 For notational simplicity we suppress the observation subscript i5 Using our notation Holland considers the case where D perp (XU ) which implies our condi-

tional exogeneity assumption D perp U | X below6 See eg Altonji and Matzkin (2005) Hoderlein and Mammen (2007 2009) Imbens and Newey

(2009) White and Lu (2011) and White and Chalak (2013)7 AT T = E(Y (1)minusY (0) | D = 1) = E(U minus0 | D = 1) = E(U ) = 0

8 The conditional density fg(d | x) can also be interpreted as Imbensrsquos (2000) generalized propen-sity score

9 There is a recent literature which allows time-invariant unobservables to be arbitrarily correlatedwith causes of interest in the general nonseparable models (see eg Evdokimov 2010) We leave theproblem of linking G-causality and structural causality in this general setting for future research

REFERENCES

Altonji J amp R Matzkin (2005) Cross section and panel data estimators for nonseparable models withendogenous regressors Econometrica 73 1053ndash1102

Angrist J K Graddy amp G Imbens (2000) The interpretation of instrumental variables estimatorsin simultaneous equations models with an application to the demand for fish Review of EconomicStudies 67 499ndash527

Angrist J O Jorda amp G Kuersteiner (2013) Semiparametric estimates of monetary policy effectsstring theory revisited NBER Working Paper 19355

Angrist J amp G Kuersteiner (2004) Semiparametric causality tests using the policy propensity scoreNBER Working Paper No 10975

Angrist J amp G Kuersteiner (2011) Causal effects of monetary shocks semiparametric conditionalindependence tests with a multinomial propensity score Review of Economics and Statistics 93725ndash747

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

286 XUN LU ET AL

Angrist J amp J Pischke (2009) Mostly Harmless Econometrics An Empiricistrsquos Companion PrincetonUniversity Press

Bester CA amp CB Hansen (2016) Grouped effects estimators in fixed effects models Journal ofEconometrics 190 197ndash208

Bonhomme S amp E Manresa (2015) Grouped patterns of heterogeneity in panel data Econometrica83 1147ndash1184

Browning M amp JM Carro (2007) Heterogeneity and microeconometrics modelling In R BlundellW K Newey and T Persson (Eds) Advances in Economics and Econometrics Theory andApplications Ninth World Congress of the Econometric Society Volume 3 pp 45ndash74 CambridgeUniversity Press

Browning M amp JM Carro (2010) Heterogeneity in dynamic discrete choice models EconometricsJournal 13 1ndash39

Burda M M Harding amp J Hausman (2015) A Bayesian semi-parametric competing risk model withunobserved heterogeneity Journal of Applied Econometrics 30 353ndash376

Chalak K amp H White (2011) An extended class of instrumental variables for the estimation of causaleffects Canadian Journal of Economics 44 1ndash51

Chalak K amp H White (2012) Causality conditional independence and graphical separation insettable systems Neural Computation 24 1611ndash1668

Chamberlain G (1982) The general equivalence of Granger and Sims causality Econometrica 50569ndash581

Chamberlain G (1984) Panel data In Z Griliches amp M Intriligator (eds) Handbook of Economet-rics vol2 ch 22 pp1247ndash1318 Elsevier

Chesher A (2003) Identification in nonseparable models Econometrica 71 1405ndash1441Chesher A (2005) Nonparametric identification under discrete variation Econometrica 73

1525ndash1550Dawid AP (1979) Conditional independence in statistical theory Journal of the Royal Statistical

Society Series B 41 1ndash31Deb P amp PK Trivedi (2013) Finite mixture for panels with fixed effects Journal of Econometric

Methods 2 35ndash51Delgado M amp W Gonzalez-Manteiga (2001) Significance testing in nonparametric regression based

on the bootstrap Annals of Statistics 29 1469ndash1507Dumitrescu E-I amp C Hurlin (2012) Testing for Granger non-causality in heterogeneous panels

Economic Modelling 29 1450ndash1460Evdokimov K (2010) Identification and estimation of a nonparametric panel data model with unob-

served heterogeneity Working paper Department of Economics Princeton UniversityFlorens JP (2003) Some technical issues in defining causality Journal of Econometrics 112

127ndash128Florens JP amp D Fougere (1996) Non-causality in continuous time Econometrica 64 1195ndash1212Florens JP amp M Mouchart (1982) A note on non-causality Econometrica 50 583ndash591Granger CWJ (1969) Investigating causal relations by econometric models and cross-spectral

methods Econometrica 37 424ndash438Granger CWJ (1980) Testing for causality a personal viewpoint Journal of Economic Dynamics

and Control 2 329ndash352Granger CWJ (1988) Some recent developments in a concept of causality Journal of Econometrics

39 199ndash211Granger CWJ amp P Newbold (1986) Forecasting Economic Time Series Academic Press 2nd

editionHausman J amp T Woutersen (2014) Estimating a semi-parametric duration model without specifying

heterogeneity Journal of Econometrics 178 114ndash131Haavelmo T (1943) The statistical implications of a system of simultaneous equations Econometrica

11 1ndash12Haavelmo T (1944) The probability approach in econometrics Econometrica 12 (supp) iii-vi

1ndash115

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 287

Hahn J (1998) On the role of the propensity score in efficient semiparametric estimation of averagetreatment effect Econometrica 66 315ndash331

Hahn J amp R Moon (2010) Panel data models with finite number of multiple equilibria EconometricTheory 26 863ndash881

Heckman JJ (2000) Causal parameters and policy analysis in economics a twentieth century retro-spective Quarterly Journal of Economics 115 45ndash97

Heckman JJ (2008) Econometric causality International Statistical Review 76 1ndash27Heckman JJ amp R Pinto (2015) Causal analysis after Haavelmo Econometric Theory 31

115ndash151Hirano K G Imbens amp G Ridder (2003) Efficient estimation of average treatment effects using the

estimated propensity score Econometrica 71 1161ndash1189Hoderlein S amp E Mammen (2007) Identification of marginal effects in nonseparable models without

monotonicity Econometrica 75 1513ndash1519Hoderlein S amp E Mammen (2009) Identification and estimation of local average derivatives in non-

separable models without monotonicity Econometrics Journal 12 1ndash25Hoderlein S amp H White (2012) Nonparametric identification in nonseparable panel data models with

generalized fixed effects Journal of Econometrics 168 300ndash314Holland P (1986) Statistics and causal inference Journal of the American Statistical Association 81

945ndash970Holtz-Eakin D W Newey amp HS Rosen (1988) Estimating vector autoregressions with panel data

Econometrica 56 1371ndash1396Hoover K (2008) Causality in economics and econometrics In S Durlauf and L Blume (eds)

The New Palgrave Dictionary of Economics 2nd edition Palgrave MacmillanHsiao C (2014) Analysis of Panel Data Cambridge University PressHuang M Y Sun amp H White (2016) A flexible nonparametric test for conditional independence

Econometric Theory forthcomingHurwicz L (1950) Generalization of the concept of identification In TC Koopmans (eds) Statistical

Inference in Dynamic Economic Models pp 238ndash257 John WileyIm K H Pesaran amp Y Shin (2003) Testing for unit roots in heterogeneous panels Journal of Econo-

metrics 115 53ndash74Imbens G (2000) The role of the propensity score in estimating dose-response functions Biometrika

87 706ndash710Imbens G (2004) Nonparametric estimation of average treatment effects under exogeneity a review

Review of Economics and Statistics 86 4ndash29Imbens G amp W Newey (2009) Identification and estimation of triangular simultaneous equations

models without additivity Econometrica 77 1481ndash1512Imbens G amp J Wooldridge (2009) Recent developments in the econometrics of program evaluation

Journal of Economic Literature 47 5ndash86Kasy M (2011) Identification in triangular systems using control functions Econometric Theory 27

663ndash671Koopmans T (1950) Statistical Inference in Dynamic Economic Models Cowles Commission Mono-

graph No 10 WileyKuersteiner G (2008) GrangerndashSims causality In S Durlauf and L Blume (Eds) The New Palgrave

Dictionary of Economics 2nd edition Palgrave MacmillanLechner M (2001) Identification and estimation of causal effect of multiple treatments under con-

ditional independence assumption In M Lechner F Pfeiller (eds) Econometric Evaluations ofLabour Market Policies Hildelberg Physical 43ndash58

Lechner M (2011) The relation of different concepts of causality used in time series and microecono-metrics Econometric Reviews 30 109ndash127

Lin C-C amp S Ng (2012) Estimation of panel data models with parameter heterogeneity when groupmembership is unknown Journal of Econometric Methods 1 42ndash55

Lindsay B (1995) Mixture Models Theory Geometry and Applications Hayward Institute for Math-ematical Statistics

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

288 XUN LU ET AL

Linton O amp P Gozalo (2014) Testing conditional independence restrictions Econometric Reviews33 523ndash552

Lu X amp L Su (2014) Determining the number of groups in latent panel structures Working paperSchool of Economics Singapore Management University

Matzkin R (2003) Nonparametric estimation of nonadditive random functions Econometrica 711339ndash1375

Matzkin R (2007) Nonparametric identification In JJ Heckman and EE Leamer (eds) The Hand-book of Econometrics 6B North-Holland

Mclachlan G amp K Basford (1988) Mixture Models Inference and Applications to Clustering MarcelDekker

Nair-Reichert U amp D Weinhold (2001) Causality tests for cross-country panels a new look at FDIand economic growth in developing countries Oxford Bulletin of Economics and Statistics 63153-171

Pearl J (2009) Causality Models Reasoning and Inference (2nd edition) Cambridge UniversityPress

Pearl J (2015) Trygve Haavelmo and the emergence of causal calculus Econometric Theory 31152ndash179

Rosenbaum P amp D Rubin (1983) The central role of the propensity score in observational studies forcausal effects Biometrika 70 41ndash55

Rubin D (1974) Estimating causal effects of treatments in randomized and non-randomized studiesJournal of Educational Psychology 66 688ndash701

Rubin D (2004) Direct and indirect causal effects via potential outcomes Scandinavian Journal ofStatistics 31 161ndash170

Sarafidis V amp N Weber (2015) A partially heterogeneous framework for analyzing panel data OxfordBulletin of Economics and Statistics 77 274ndash296

Sims C (1972) Money income and causality American Economic Review 62 540ndash552Song K (2009) Testing conditional independence via Rosenblatt transforms Annals of Statistics 37

4011ndash4015Strotz R amp H Wold (1960) Recursive vs nonrecursive systems An attempt at synthesis Economet-

rica 28 417ndash427Su L Z Shi amp PCB Phillips (2014) Identifying latent structures in panel data Working paper

Department of Economics Yale UniversitySu L amp H White (2007) A Consistent characteristic function-based test for conditional indepen-

dence Journal of Econometrics 141 807ndash834Su L amp H White (2008) A nonparametric Hellinger metric test for conditional independence Econo-

metric Theory 24 829ndash864Su L amp H White (2014) Testing conditional independence via empirical likelihood Journal of

Econometrics 182 27ndash44Sun Y (2005) Estimation and inference in panel structure models Working paper Department of

Economics UCSDWhite H amp K Chalak (2009) Settable systems An extension of Pearlrsquos causal model with optimiza-

tion equilibrium and learning Journal of Machine Learning Research 10 1759ndash1799White H amp K Chalak (2010) Testing a conditional form of exogeneity Economics Letters 109

88ndash90White H amp K Chalak (2013) Identification and identification failure for treatment effects using

structural systems Econometric Reviews 32 273ndash317White H K Chalak amp X Lu (2011) Linking Granger causality and the Pearl causal model with

settable systems Journal of Machine Learning Research Workshop and Conference Proceedings12 1ndash29

White H amp P Kennedy (2009) Retrospective estimation of causal effects through time In J Castleamp N Shephard (eds) The Methodology and Practice of Econometrics A Festschrift in Honour ofDavid F Hendry pp 59ndash87 Oxford University Press

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 289

White H amp X Lu (2010) Granger causality and dynamic structural systems Journal of FinancialEconometrics 8 193ndash243

White H amp X Lu (2011) Causal diagrams for treatment effect estimation with application to selectionof efficient covariates Review of Economics and Statistics 93 1453ndash1459

Wooldridge J (2005) Simple solutions to the initial conditions problem for dynamic nonlinear paneldata models with unobserved heterogeneity Journal of Applied Econometrics 20 39ndash54

Zellner A (1979) Causality and econometrics In K Brunner and AH Meltzer (eds) Three Aspectsof Policy and Policymaking Carnegie-Rochester Conference Series Vol 10 North-Holland 9ndash54

APPENDIX

Proof of Theorem 31 To show D perp U | X we establish that f (u | d x) = f (u | x)for all (ud x) We have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

= f (u x)

f (x)= f (u | x)

where the first equality in the second line holds since A2(i) (ie Dg | Xg sim D | X)implies fg(d x) = f (d | x) fg(x) say and the second holds by A2(i i) fg(u | d x) =fg(u | x) n

Proof of Theorem 32 Note that

f (u | x) = f (u x)

f (x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

We also have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | d x) fg(x)sum

gisinG pg fg(x)

The first equality in the second line follows from A2(i) Then f (u | d x) = f (u | x) forall (ud x) if and only ifsumgisinG

pg fg(u | d x) fg(x) =sumgisinG

pg fg(u | x) fg(x) for all (ud x)

By assumption (Ug | Dg Xg) g isinG is regular It follows immediately that for all g isinGDg perp Ug | Xg n

Proof of Proposition 33 Take any g isin G and let j belong to group Jg Dg rArrSYg means that Yj = r(Zj Uj bj ) Thus Uj perp Dj | X j implies that Yj perp Dj | X j byDawid (1979 Lemmas 41 and 42) This holds for all g isin G Applying Theorem 31gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

290 XUN LU ET AL

Proof of Theorem 34 We prove this by showing that Y perp D | X implies that Dg rArrSYg for all g isinG Given the assumed regularity Theorem 32 ensures that Y perp D | X impliesthat Yg perp Dg | Xg for all g isin G

Given (i) eq(33) and A2(ii) imply that for all j in group JgE

(Yj |X j = x Dj = d

) = r1(d z bj

)+ E[r2

(zUj bj

) | Dj = d X j = x]

= r1(d z bj

)+ E[r2

(zUj bj

) | X j = x]

Clearly r1(d z bj

)is constant in d if and only if E

(Yj |Dj = d X j = x

)is constant in

d Yj perp Dj | X j implies that E(Yj |Dj = d X j = x

)is constant in d which then implies

that r1(d z bj

)is constant in d Thus Dj rArrS Yj so Dg rArrS Yg

Given (i i) let ky = 1 without loss of generality Then by A2(ii) for all j in group JgFUj (u | x) = E

[1Uj le u

∣∣ X j = x]

= E[

1r(d zUj bj

) le r(d zu bj

)∣∣ X j = x]

= E[

1r(Dj Zj Uj bj

) le r(d zu bj

)∣∣ Dj = d X j = x]

= E[

1Yj le r

(d zu bj

)∣∣ Dj = d X j = x]

= Fj(r(d zu bj

) | d x)

By strict monotonicity of Fj (y | d x) in y

r(d zu bj

) = Fminus1j

(FUj (u | x) | d x

)

Now Yj perp Dj | X j implies that Fminus1j

(FUj (u | x) | d x

)is constant in d so r

(d zu bj

)is

also a constant function in d ie Dj rArrS Yj so Dg rArrS Yg As either (i) or (i i) holds for each g isin G we have Dg rArrS Yg for all g isin G n

Proof of Theorem 35 To show that A1 - A2 and the assumed regularity for Ygtogether with Dg rArrS(Xg) Yg for some g isin G imply D perp Y | X we let j in group Jgg isin G be such that Dj rArrS(X j ) Yj A2(i i) ensures that Dj perp Uj | X j so

P[Yj le y | Dj = d X j = x] =int

1r(d zu bj ) le y

d Fj (u | d x)

=int

1r(d zu bj ) le y

d Fj (u | x)

By assumption Dj rArrS(X j ) Yj so there exists ylowast isin supp(Yj ) such that there is no mea-

surable mapping f jylowast for whichint

1r(d zu bj ) le ylowastd Fj (u | d x) = f jylowast(x) ae-x Specifically this rules out the possibility thatint

1r(d zu bj ) le ylowast

d Fj (u | X j ) = f jylowast(X j ) equiv P[Yj le ylowast | X j ] as

Thus there exists ylowast isin supp(Yj ) such that

P[Yj le ylowast | Dj = d X j = x

] = P[Yj le ylowast | X j = x

]

for x in a set of positive probability so Yj perp Dj | X j Given the regularity assumed for YgTheorem 32 (for Yg) gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 291

Proof of Theorem 41 The proof is analogous to that of Theorem 31 n

Proof of Proposition 42 The proof is analogous to that of Proposition 33 n

Proof of Proposition 43 Under B3 Dgt rArrS Ygt and Dgtminus1 rArrS Ygtminus1 forallg isinGt cupGtminus1 imply that for j isin Jg there exist two measurable functions r1 and r2 such that

Yjt = r1(Uj0 bj0)+ r2(Y jtminus1 Zjt Ujt bjt

) j = 1 N t = 1

Hence

Yjt minusYjtminus1 = r2(Y jtminus1 Zjt Ujt bjt

)minus r2(Y jtminus2 Zjtminus1Ujtminus1 bjtminus1

)

At the sample level this means that

Yt minusYtminus1 = r2(Y tminus1 Zt Ut Bt

)minus r2(Y tminus2 Ztminus1Utminus1 Btminus1

)

Then B4 implies that

Dt perp (Xt Y tminus1Ut Utminus1 Bt Btminus1) | Xt Y tminus1

which further implies that Dt perp (Yt minusYtminus1) | Xt Y tminus1by Dawid (1979 Lemmas 41 and42) as (Yt minus Ytminus1) is a function of (Xt Y tminus1Ut Utminus1 Bt Btminus1) Then we have Dt perpYt | Xt Y tminus1 as Ytminus1 is a component of Y tminus1 n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

Page 15: 33 GRANGER CAUSALITY AND STRUCTURAL CAUSALITY IN … Lu_Su... · 2017. 2. 7. · 264 XUNLUETAL. natural experiments. The goal of this paper is to establish the analogous equiv-alence

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 277

PROPOSITION 33 Let A1 - A2 hold Then structural noncausality(Dg rArrS Yg forallg isinG) implies that D does not Gminuscause Y wrt X ie D perp Y | X

This result is intuitive when structural effects are well defined and identifiableGminusnon-causality follows from structural noncausality Holland (1986 p 958)gives a similar result for the random experiment case Our Proposition 33 jus-tifies using tests of Gminusnoncausality to test structural noncausality if we rejectGminusnoncausality we must reject structural noncausality

As WL show structural causality does not imply Gminuscausality Without furtherassumptions the concepts are not equivalent Nevertheless if as is commonlyassumed in the literature (see WL for discussion) r obeys separability betweenobservable causes D and unobservable causes U or r obeys a specific form ofmonotonicity in U then with suitable regularity structural causality does implyGminuscausality We have

THEOREM 34 Given A1 - A2 suppose Yg | Dg Xgg isin G is regularSuppose further that for all g isin G either (i) or (i i) holds

(i) For all j isin Jg for unknown measurable functions r1 and r2

Yj = r1(Dj Zj bj

)+ r2(Zj Uj bj

) (33)

(ii) For all j isinJg for = 1 ky r(d zu bj ) = r0(d zu bj ) for scalaru where r0(d z middot bj ) is strictly monotone increasing for each admissi-ble (d z) and Fj(y | d x) = P[Yj le y | Dj = d X j = x] is strictlymonotone in y for each admissible (d x)

Then structural causality (for some g isin G Dj rArrS Yj for (all) j in Jg) impliesthat D Gminuscauses Y wrt X ie D perp Y | X

Kasy (2011) gives a discussion of structures satisfying (i i) Even without sep-arability or monotonicity there is an equivalence between Gminuscausality and astronger notion of structural causality that handles certain exceptional cases wherethe causal structure and the conditional distribution of (Uj Dj ) given X j inter-act in just the right way to hide the structural causality WL call this strongernotion structural causality with positive probability and provide discussions Welet Yj = supp(Yj ) and Xj = supp(X j )

DEFINITION 35 (Structural Causality with Positive Probability) SupposeA1 holds and let j be given Suppose that for each y isin Yj there exists a mea-surable function f jy Xj rarr [01] such thatint

1r(Dj Zj u bj

)lt y

d Fj

(u | X j

) = f jy(X j

)as (34)

where Fj (u | X j ) = E[1Uj le u | X j ] Then Dj does not structurally cause Yj

almost surely (as) wrt X j (Dj rArrS(X j ) Yj ) Otherwise Dj structurally causesYj with positive probability (wpp) wrt X j

(Dj rArrS(X j ) Yj

)

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

278 XUN LU ET AL

By the definition of a group the same structural causality wpp relations holdfor all j in a given group We thus write Dg rArrS(Xg) Yg when Dj rArrS(X j ) Yj holdsfor all j in group Jg

Below is an example in WL in which Dj structurally causes Yj while Dj doesnot structurally cause Yj as Consider the structural equation r

Yj = r(Dj Uj

) = DjradicD2

j +1Uj1 + 1radic

D2j +1

Uj2

where Uj equiv (Uj1Uj2) and Dj are all N (01) random variables and Dj U1 j andU2 j are mutually independent For simplicity there is no bj Zj or X j It is clearthat Dj structurally causes Yj here Nevertheless Dj does not structurally causeYj as To see this note that Yj and Dj are independent Thus the LHS of eq(34)becomesint

1r(Dj u1u2

)lt y

d Fj (u1u2)

=int

1r(Dj u1u2

)lt y

d Fj

(u1u2|Dj

) = Pr[Yj lt y|Dj

] = (y)

where Fj (middot) and Fj(middot|Dj

)denote the CDF of Uj and the conditional CDF of Uj

given Dj and is the standard normal CDF Thus the RHS does not depend onDj ie Dj does not structurally cause Yj as

Theorem 35 below gives a result linking Gminuscausality and structural causalityas

THEOREM 35 Suppose A1 - A2 hold Suppose also that Yg | Dg Xgg isinG is regular If Dj structurally causes Yj wpp wrt X j (for some g isin GDj rArrS(X j ) Yj for all j in Jg) then D Gminuscauses Y wrt X ie D perp Y | X

Together Proposition 33 and Theorem 35 establish that cross-sectionGminuscausality and structural causality are essentially equivalent under the key con-ditional exogeneity assumption We say ldquoessentiallyrdquo as Theorem 35 rules outthe two ways that Gminusnon-causality can mask structural causality The first arisesfrom subtle interactions between the causal structure and the conditional distribu-tion of (Uj Dj ) given X j The second arises under heterogeneity from delicatecancellations among the conditional distributions of Yj given Dj and X j acrossgroups These exceptional possibilities only trivially mitigate the power of testsfor Gminuscausality as tests for structural causality Errors of interpretation and in-ference need not result as long as these exceptions are recognized

35 Identification and identification failure

To gain further insight into the relation between Gminuscausality and effect identi-fication we now undertake a deeper analysis of Gminuscausality in mean For this

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 279

let qg(d x) = fg(d x)sumhisinG (ph fh(d x)) Under A1 the law of iterated ex-

pectations gives

E (Y | D = d X = x) =sumgisinG

pg qg(d x)

intr(d zu bg) d Fg(u | d x)

For concreteness consider identifying average marginal effects for a continu-ous treatment d To allow both discrete and continuous Ug let νg(u | d x) definea σminusfinite measure dominating that defined by Fg(u | d x) and let fg(u | d x)define the associated Radon-Nikodym density for Ug given (Dg Xg) Assum-ing differentiability and mild regularity derivations parallel to those of White andChalak (2013) give

part

partdE(Y | D = d X = x)

=sumgisinG

pg qg(d x)

intpart

partdr(d zu bg) fg(u | d x) dνg(u | d x)

+sumgisinG

pg qg(d x)

int[r(d zu bg)

part

partdln fg(u | d x)]

times fg(u | d x) dνg(u | d x)

+sumgisinG

pgpart

partdqg(d x)

intr(d zu bg) fg(u | d x)dνg(u | d x)

After some manipulation this regression derivative can be written as

part

partdE (Y | D = d X = x) = β(d x)+ δ1(d x)+ δ2(d x) (35)

The first of the three terms on the right is a weighted average marginal effect

β(d x) =sumgisinG

pg qg(d x)βg(d x) where βg(d x)

= E[(partpartd)r(Dg ZgUg bg) | Dg = d Xg = x

]is the covariate-conditioned average marginal effect of Dj on Yj at (Dj = d X j =x) for j isin Jg g isin G Thus when the other two terms vanish the regressionderivative identifies the structurally meaningful weighted effect β(d x) andwhen β(d x) is nonzero for (d x) in a set of positive probability we haveGminuscausality in mean of D for Y wrt X When the other two terms do notvanish we generally have identification failure and there is no necessary linkbetween Gminuscausality and structural causality

The second term is a pure endogeneity bias

δ1(d x) =sumgisinG

pg qg(d x) E(εg Sg | Dg = d Xg = x

)

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

280 XUN LU ET AL

where εg = Ygminus E(Yg | Dg = d Xg = x

)is the regression residual for the given

group and Sg = (partpartd) ln fg(Ug | Dg Xg) is the exogeneity score of White andChalak (2013) Observe that E(εg Sg | Dg = d Xg = x) represents a specificform of omitted variable bias where the exogeneity score is the omitted variableA sufficient (but not necessary) condition for this bias to vanish is A2(i i) Dg perpUg | Xg for all g isin G

The third term is a pure heterogeneity bias

δ2(d x) =sumgisinG

pgpart

partdqg(d x) E

(Yg | Dg = d Xg = x

)

due to heterogeneity of fg(d | x) across members of G8 A sufficient (but notnecessary) condition for this to vanish is A2(i) Dg | Xg sim D | X for all g isin G

Thus A2 ensures that the regression derivative identifies the weighted effect

β(d x) = βlowast(d x) =sumgisinG

pg qlowastg(x)βlowast

g(d x)

where qlowastg(x) = fg(x)

sumh (ph fh(x)) and βlowast

g(d x) = int(partpartd)r(d zu bg)

d Fg(u | x) This identification provides the link between Gminuscausality in meanand structural causality

Although A2 is not necessary for effect identification cases where identifica-tion holds in the absence of A2 are quite special analogous to failures of reg-ularity Thus for practical purposes A2 can be viewed as playing the key rolein identifying effects of interest and thereby linking Gminuscausality and structuralcausality

Further as Proposition 61 and Corollary 62 of WL show in the absence ofstructural causality Gminuscausality is essentially equivalent to exogeneity failureHere this is reflected in the fact that when βg(d x) vanishes for all g A1 andmild regularity conditions give

part

partdE (Y | D = d X = x) = δ1(d x)+ δ2(d x)

Thus with structural noncausality as Gminuscausality in mean implies the fail-ure of A2(i) A2(i i) or both conversely failure of A2 essentially ensuresGminuscausality in mean

36 Testing for Gminuscausality and structural causality in cross-sectiondata

As just seen the hypothesis of Gminusnon-causality and thus of structural noncausal-ity under the conditional exogeneity assumption is a specific conditional inde-pendence In the literature there are many conditional independence tests thatapply to IID data (eg see Delgado and Gonzalez-Manteiga 2001 Fernandes and

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 281

Flores 2001 Su and White 2007 2008 2014 Song 2009 Linton and Gozalo2014 Huang Sun and White 2016) These methods can be computationallychallenging as they are nonparametric Angrist and Kuersteiner (2004 2011)develop a semi-parametric test for conditional independence Based on the factthat Y perp D | X implies ψ1(Y ) perp ψ2(D) | X for any measurable vector functions(ψ1ψ2) WL propose several convenient regression-based tests

4 STRUCTURAL CAUSALITY AND GminusCAUSALITY IN PANEL DATAThe literature contains considerable discussion about Gminuscausality in the paneldata setting Nevertheless the focus is mainly on linear or parametric modelsFor example Chamberlain (1984) discusses Gminuscausality ldquoconditional on un-observables [lsquofixed effectrsquo]rdquo in a linear model and a logit model Holtz-EakinNewey and Rosen (1988) consider a traditional linear vector autoregression(VAR) and use GMM to test Gminuscausality Nair-Reichert and Weinhold (2001)discuss Gminuscausality in a dynamic mixed fixed- and random-coefficients linearmodel allowing heterogeneity across individuals Dumitrescu and Hurlin (2012)test for Gminuscausality in a linear dynamic panel model with fixed coefficients thatvary across individuals There is also a growing literature on nonlinear and non-separable panel models (see eg Hoderlein and White (2012) and the referencestherein) We are not aware that Gminuscausality has been discussed in such modelsso far

We emphasize that in panel data Gminuscausality is also entirely based on pre-dictability and has no causal interpretation In this section we link Gminuscausalityand structural causality in general panel structures Since the static panel case issimilar to the cross-sectional case we focus on a dynamic data generating processTo simplify notation for given y isin N+ we let Y jtminus1 denote the finite historyY jtminus1 = (Yjtminusy Yjtminus1)

Assumption B1 (Panel Population DGP) Let (F P) be a complete prob-ability space let N isin N+ and let y isin N+ For the members of a populationj isin 1 N let the random vectors Yjt t = 12 be structurally determinedby a triangular system as

Yjt = r(Y jtminus1 Djt Zjt Ujt bjt

) (41)

where r is an unknown measurable ky times 1 function ky isin N+ Yjτ (τ = 1 minusy 0) Djt (kd times1kd isin N+) Zjt (kz times1 kz isin N) and Ujt (ku times1 ku isin N)are vectors of nondegenerate random variables on (F P) and bjt is a non-random real vector of dimension kb isin N Suppose also that Wjt t = 12 arerandom vectors on (F P) with dimension kw isin N The triangular structureis such that for t = 12 neither Yjsinfins=1minusy

nor Djsinfins=1 structurally deter-mine Wjt Yjsinfins=1minusy

does not structurally determine Djt Zjt or Ujt andDjsinfins=1 does not structurally determine Zjt or Ujt

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

282 XUN LU ET AL

The interpretation of structural eq(41) is the same as that for eq(31) ex-cept that here we have multiple time periods t = 12 The elements of(Yjt Djt Zjt Wjt Ujt

)and bjt rsquos can contain both time-varying and time-

invariant elements For now we do not distinguish these The elements of(Djt Zjt Wjt Ujt ) can contain lags of underlying variables By permittingonly finite histories Y jtminus1 Djt Zjt we restrict attention to Markov-type datagenerating processes this greatly simplifies the analysis and corresponds to thestructures usually considered in practice

As in the cross-section case we can have both structural heterogeneity anddistributional heterogeneity We specify the relevant structural and distributionalheterogeneities below

We again impose random sampling from the population For simplicity weassume all time periods can be observed for every individual so we observe abalanced panel Similar to the cross-section case we write

Yt = r (Y tminus1 Dt Zt Ut Bt ) t = 1 T

by suppressing the cross-sectional subscript i We assume that we observe data(Yt Dt Zt Wt ) t = 1 T and relevant lags prior to t = 1

41 Linking structural causality and Gminuscausality in dynamic panelsWe first define structural causality for the dynamic panel structure Let Djt be thesupport of Djt

DEFINITION 41 (Structural Causality in Dynamic Panels) Let j and t begiven If the function r(y jtminus1 middot zjt ujt bjt ) Djt rarr R

ky is constant on Djt

for all admissible y jtminus1 zjt and ujt then Djt does not structurally cause Yjt and we write Djt rArrS Yjt Otherwise Djt structurally causes Yjt and we writeDjt rArrS Yjt

We let X jt = (Z primejt W prime

jt )prime denote the covariates at time t WL consider

retrospective conditional exogeneity and retrospective Gminuscausality (White andKennedy 2009) We also allow this For this we letX jt denote a covariate his-tory that may contain X jt and lags or leads of X jt as in Wooldridge (2005 p 41)or WL ie for some m isin N+

X jt = SX(X jtminusm X jt X jt+m

) t = 12

where SX is a given selection matrix We assume

X jt X jtminusm subeX jt and

allow X jt also to contain leads of X jt When Xjt contains leads we have theretrospective case as in WL We denote randomly sampled covariates X t Forsimplicity we also assume that all needed elements of X t are observable Tem-poral precedence plays an important role in panel data In particular comparedwith the cross-sectional case we now have a richer set of covariates (includingboth leads and lags) to choose from

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 283

To provide a link between structural causality and Gminuscausality for dynamicpanels we condition on not only covariates X t but also lags of Yt For thiswe denote Y jtminus1 = (Yjtminusmy Yjtminus1) where my isin N+ and my ge y ie

Y jtminus1 sube Y jtminus1 Similarly we let Y tminus1 be the randomly sampled Y jtminus1Similar to the cross-section case for each t we define population group Jgt

g isin 1 t as a collection of population units having identical bjt and iden-tical distribution of (Ujt Djt ) |X jt Y jtminus1 We also let Gt = 1 t Forgiven t we write Dgt rArrS Ygt to denote Djt rArrS Yjt for all j in group Jgt g isin Gt

Next we impose an exogeneity assumption analogous to A2

Assumption B2 (Heterogeneous Dynamic Panel Exogeneity) Let t be givenFor all g isin Gt (i) Dgt |Xgt Y gtminus1 sim Dt |X t Y tminus1 and (i i) Dgt perp Ugt |Xgt Y gtminus1

Here as before for all g isin Gt the conditional distributions of Dgt given(Xgt Y gtminus1) are assumed to be identical and we write the common distributionas Dt |X t Y tminus1

The analog of Theorem 31 relating population and sample conditional exo-geneity is

THEOREM 41 Suppose B1ndashB2(i) hold and let t be given If B2(i i) holdsthen Dt perp Ut |X t Y tminus1

Proposition 42 provides a link between structural noncausality and Gminusnon-causality

PROPOSITION 42 Suppose that B1 and B2 hold Let t be given Thenstructural noncausality (Dgt rArrS Ygt forallg isin Gt ) implies that Dt does notGminuscause Yt wrtX t Y tminus1 ie Dt perp Yt |X t Y tminus1

We can also define structural causality with positive probability for dynamicstructures and show that structural causality and Gminuscausality are ldquoessentiallyrdquoequivalent under the key conditional exogeneity assumption For brevity we omitthe details

So far we have not carefully distinguished between time-varying and time-invariant elements of unobservable bjt and Ujt Below we briefly discuss thecase where we allow time-invariant components of Ujt and the randomly sampledbjt to be arbitrarily dependent or correlated with the cause of interest Djt For this we denote Ujt equiv (Ujt Uj0) where Ujt is time varying and Uj0time-invariant Similarly we denote bjt equiv (bjt bj0) The randomly sampledUjt Uj0 bjt bj0 are denoted as Ut U0 Bt B0 respectively In princi-ple we want to remove the time-invariant Uj0 and bj0 using eg the firstdifference For this it is convenient to impose a separable assumption on thestructural equation9 We also impose a conditional exogeneity assumption at thesample level

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

284 XUN LU ET AL

Assumption B3 Suppose that B1 holds with

Yjt = r1(Djt Uj0 bj0

)+r2(Y jtminus1 Djt Zjt Ujt bjt

) j = 1 N t = 1

where r1 and r2 are two unknown ky times1 measurable functions

Assumption B4 Suppose that for given t Dt perp (Ut Utminus1 Bt Btminus1

) |Xt Y tminus1

B4 is a sample-level assumption which can be supported by a correspond-ing assumption at the heterogeneous population level Note that in general B4is a strong assumption as we condition on Y tminus1 which is a function of bothDtminus1 and Utminus1 Nevertheless this assumption is plausible under the null of struc-tural noncausality as Y jtminus1 is a constant function of Djtminus1 under the nullOne simple example is that bjt rsquos are constants over j and t and Djt infint=1 perpYjt 0

t=1minusyUjt infint=1X jt infint=1 Under the null of structural noncausality for

all t Y jtminus1 is a function of Yjs0s=1minusy

Ujsts=1X jst

s=1 Then it is easyto show that in this case B4 is satisfied Certainly it will be interesting to relaxB4 but we leave this for future research

PROPOSITION 43 Suppose that B1ndashB4 hold Suppose that Dgt rArrS Ygtand Dgtminus1 rArrS Ygtminus1 forallg isin Gt cupGtminus1 Then Dt perp Yt | Xt Y tminus1

Proposition 43 suggests that when there are time-invariant components in theunobservables we can test for structural noncausality by testing for Gminusnon-causality Dt perp Yt | Xt Y tminus1 under the conditional exogeneity assumption B4

42 Testing for Gminuscausality and structural causality in panel dataAs shown above we can perform a Gminusnon causality test to test for structuralnoncausality Testing Gminusnon-causality in panel data is also simply a conditionalindependence test Here we focus on testing Dt perp Yt | Xt Y tminus1

First assume that the joint distributions of (Dt Yt Xt ) are identical over t In this case we can pool the data and implement a conditional independence testas discussed in Section 36

Second suppose that the joint distributions of (Dt Yt Xt ) are different overtime t In this case for each time period t using the cross-section data we canimplement a conditional independence test for Dt perp Yt | (Xt Y tminus1) and obtain atest statistic sayWt Thus we have T test statistics W1W2 WT and eachone can be used to test for structural noncausality for each t We may also want totest the hypothesis that for all t Djt does not structurally cause Yjt For this wecan construct our test statistic by taking the average of the T test statistics ieW equiv 1

T

sumTt=1Wt as in Im Pesaran and Shin (2003) and Dumitrescu and Hurlin

(2012) Other forms of ldquoaveragesrdquo are possible and special care is needed to takeinto account the dependence structure over time

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 285

5 CONCLUSION

This paper provides direct links between Granger causality and structural causal-ity in cross-section and panel data We extend Granger causality to cross-section and panel data and give a natural definition of structural causality inheterogeneous populations We show that under the key conditional exogeneityassumption Granger causality is essentially equivalent to structural causality incross-section and panel data Similar to the results in White and Lu (2010) fortime-series data our results here should enable researchers to avoid the misuse ofGranger causality and to establish the desired structural causal relation

NOTES

1 Strictly speaking if the conditional distributions are parameterized or treated as infinite-dimensional parameters then distributional heterogeneity may also be thought of as a special caseof structural heterogeneity

2 The classification is certainly overly simplistic We ignore the early literature on the philosophyof causality (for a review see eg Holland 1986 and Hoover 2008) We also ignore the litera-ture on causality discussed in other disciplines notably in machine learning (see eg Pearl 20092015 White and Chalak 2009 and White Chalak and Lu 2011) In particular Heckman and Pinto(2015) compare Haavelmorsquos structural framework of causality with the framework of Directed AcyclicGraphs (DAG) studied in the Bayesian network

3 Although Uj is typically called ldquounobservablerdquo it is better to view its elements as variables thatwill be omitted from empirical analysis This may be because they are unobservable it may also bebecause the researcher has purposefully or inadvertantly neglected them

4 For notational simplicity we suppress the observation subscript i5 Using our notation Holland considers the case where D perp (XU ) which implies our condi-

tional exogeneity assumption D perp U | X below6 See eg Altonji and Matzkin (2005) Hoderlein and Mammen (2007 2009) Imbens and Newey

(2009) White and Lu (2011) and White and Chalak (2013)7 AT T = E(Y (1)minusY (0) | D = 1) = E(U minus0 | D = 1) = E(U ) = 0

8 The conditional density fg(d | x) can also be interpreted as Imbensrsquos (2000) generalized propen-sity score

9 There is a recent literature which allows time-invariant unobservables to be arbitrarily correlatedwith causes of interest in the general nonseparable models (see eg Evdokimov 2010) We leave theproblem of linking G-causality and structural causality in this general setting for future research

REFERENCES

Altonji J amp R Matzkin (2005) Cross section and panel data estimators for nonseparable models withendogenous regressors Econometrica 73 1053ndash1102

Angrist J K Graddy amp G Imbens (2000) The interpretation of instrumental variables estimatorsin simultaneous equations models with an application to the demand for fish Review of EconomicStudies 67 499ndash527

Angrist J O Jorda amp G Kuersteiner (2013) Semiparametric estimates of monetary policy effectsstring theory revisited NBER Working Paper 19355

Angrist J amp G Kuersteiner (2004) Semiparametric causality tests using the policy propensity scoreNBER Working Paper No 10975

Angrist J amp G Kuersteiner (2011) Causal effects of monetary shocks semiparametric conditionalindependence tests with a multinomial propensity score Review of Economics and Statistics 93725ndash747

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

286 XUN LU ET AL

Angrist J amp J Pischke (2009) Mostly Harmless Econometrics An Empiricistrsquos Companion PrincetonUniversity Press

Bester CA amp CB Hansen (2016) Grouped effects estimators in fixed effects models Journal ofEconometrics 190 197ndash208

Bonhomme S amp E Manresa (2015) Grouped patterns of heterogeneity in panel data Econometrica83 1147ndash1184

Browning M amp JM Carro (2007) Heterogeneity and microeconometrics modelling In R BlundellW K Newey and T Persson (Eds) Advances in Economics and Econometrics Theory andApplications Ninth World Congress of the Econometric Society Volume 3 pp 45ndash74 CambridgeUniversity Press

Browning M amp JM Carro (2010) Heterogeneity in dynamic discrete choice models EconometricsJournal 13 1ndash39

Burda M M Harding amp J Hausman (2015) A Bayesian semi-parametric competing risk model withunobserved heterogeneity Journal of Applied Econometrics 30 353ndash376

Chalak K amp H White (2011) An extended class of instrumental variables for the estimation of causaleffects Canadian Journal of Economics 44 1ndash51

Chalak K amp H White (2012) Causality conditional independence and graphical separation insettable systems Neural Computation 24 1611ndash1668

Chamberlain G (1982) The general equivalence of Granger and Sims causality Econometrica 50569ndash581

Chamberlain G (1984) Panel data In Z Griliches amp M Intriligator (eds) Handbook of Economet-rics vol2 ch 22 pp1247ndash1318 Elsevier

Chesher A (2003) Identification in nonseparable models Econometrica 71 1405ndash1441Chesher A (2005) Nonparametric identification under discrete variation Econometrica 73

1525ndash1550Dawid AP (1979) Conditional independence in statistical theory Journal of the Royal Statistical

Society Series B 41 1ndash31Deb P amp PK Trivedi (2013) Finite mixture for panels with fixed effects Journal of Econometric

Methods 2 35ndash51Delgado M amp W Gonzalez-Manteiga (2001) Significance testing in nonparametric regression based

on the bootstrap Annals of Statistics 29 1469ndash1507Dumitrescu E-I amp C Hurlin (2012) Testing for Granger non-causality in heterogeneous panels

Economic Modelling 29 1450ndash1460Evdokimov K (2010) Identification and estimation of a nonparametric panel data model with unob-

served heterogeneity Working paper Department of Economics Princeton UniversityFlorens JP (2003) Some technical issues in defining causality Journal of Econometrics 112

127ndash128Florens JP amp D Fougere (1996) Non-causality in continuous time Econometrica 64 1195ndash1212Florens JP amp M Mouchart (1982) A note on non-causality Econometrica 50 583ndash591Granger CWJ (1969) Investigating causal relations by econometric models and cross-spectral

methods Econometrica 37 424ndash438Granger CWJ (1980) Testing for causality a personal viewpoint Journal of Economic Dynamics

and Control 2 329ndash352Granger CWJ (1988) Some recent developments in a concept of causality Journal of Econometrics

39 199ndash211Granger CWJ amp P Newbold (1986) Forecasting Economic Time Series Academic Press 2nd

editionHausman J amp T Woutersen (2014) Estimating a semi-parametric duration model without specifying

heterogeneity Journal of Econometrics 178 114ndash131Haavelmo T (1943) The statistical implications of a system of simultaneous equations Econometrica

11 1ndash12Haavelmo T (1944) The probability approach in econometrics Econometrica 12 (supp) iii-vi

1ndash115

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 287

Hahn J (1998) On the role of the propensity score in efficient semiparametric estimation of averagetreatment effect Econometrica 66 315ndash331

Hahn J amp R Moon (2010) Panel data models with finite number of multiple equilibria EconometricTheory 26 863ndash881

Heckman JJ (2000) Causal parameters and policy analysis in economics a twentieth century retro-spective Quarterly Journal of Economics 115 45ndash97

Heckman JJ (2008) Econometric causality International Statistical Review 76 1ndash27Heckman JJ amp R Pinto (2015) Causal analysis after Haavelmo Econometric Theory 31

115ndash151Hirano K G Imbens amp G Ridder (2003) Efficient estimation of average treatment effects using the

estimated propensity score Econometrica 71 1161ndash1189Hoderlein S amp E Mammen (2007) Identification of marginal effects in nonseparable models without

monotonicity Econometrica 75 1513ndash1519Hoderlein S amp E Mammen (2009) Identification and estimation of local average derivatives in non-

separable models without monotonicity Econometrics Journal 12 1ndash25Hoderlein S amp H White (2012) Nonparametric identification in nonseparable panel data models with

generalized fixed effects Journal of Econometrics 168 300ndash314Holland P (1986) Statistics and causal inference Journal of the American Statistical Association 81

945ndash970Holtz-Eakin D W Newey amp HS Rosen (1988) Estimating vector autoregressions with panel data

Econometrica 56 1371ndash1396Hoover K (2008) Causality in economics and econometrics In S Durlauf and L Blume (eds)

The New Palgrave Dictionary of Economics 2nd edition Palgrave MacmillanHsiao C (2014) Analysis of Panel Data Cambridge University PressHuang M Y Sun amp H White (2016) A flexible nonparametric test for conditional independence

Econometric Theory forthcomingHurwicz L (1950) Generalization of the concept of identification In TC Koopmans (eds) Statistical

Inference in Dynamic Economic Models pp 238ndash257 John WileyIm K H Pesaran amp Y Shin (2003) Testing for unit roots in heterogeneous panels Journal of Econo-

metrics 115 53ndash74Imbens G (2000) The role of the propensity score in estimating dose-response functions Biometrika

87 706ndash710Imbens G (2004) Nonparametric estimation of average treatment effects under exogeneity a review

Review of Economics and Statistics 86 4ndash29Imbens G amp W Newey (2009) Identification and estimation of triangular simultaneous equations

models without additivity Econometrica 77 1481ndash1512Imbens G amp J Wooldridge (2009) Recent developments in the econometrics of program evaluation

Journal of Economic Literature 47 5ndash86Kasy M (2011) Identification in triangular systems using control functions Econometric Theory 27

663ndash671Koopmans T (1950) Statistical Inference in Dynamic Economic Models Cowles Commission Mono-

graph No 10 WileyKuersteiner G (2008) GrangerndashSims causality In S Durlauf and L Blume (Eds) The New Palgrave

Dictionary of Economics 2nd edition Palgrave MacmillanLechner M (2001) Identification and estimation of causal effect of multiple treatments under con-

ditional independence assumption In M Lechner F Pfeiller (eds) Econometric Evaluations ofLabour Market Policies Hildelberg Physical 43ndash58

Lechner M (2011) The relation of different concepts of causality used in time series and microecono-metrics Econometric Reviews 30 109ndash127

Lin C-C amp S Ng (2012) Estimation of panel data models with parameter heterogeneity when groupmembership is unknown Journal of Econometric Methods 1 42ndash55

Lindsay B (1995) Mixture Models Theory Geometry and Applications Hayward Institute for Math-ematical Statistics

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

288 XUN LU ET AL

Linton O amp P Gozalo (2014) Testing conditional independence restrictions Econometric Reviews33 523ndash552

Lu X amp L Su (2014) Determining the number of groups in latent panel structures Working paperSchool of Economics Singapore Management University

Matzkin R (2003) Nonparametric estimation of nonadditive random functions Econometrica 711339ndash1375

Matzkin R (2007) Nonparametric identification In JJ Heckman and EE Leamer (eds) The Hand-book of Econometrics 6B North-Holland

Mclachlan G amp K Basford (1988) Mixture Models Inference and Applications to Clustering MarcelDekker

Nair-Reichert U amp D Weinhold (2001) Causality tests for cross-country panels a new look at FDIand economic growth in developing countries Oxford Bulletin of Economics and Statistics 63153-171

Pearl J (2009) Causality Models Reasoning and Inference (2nd edition) Cambridge UniversityPress

Pearl J (2015) Trygve Haavelmo and the emergence of causal calculus Econometric Theory 31152ndash179

Rosenbaum P amp D Rubin (1983) The central role of the propensity score in observational studies forcausal effects Biometrika 70 41ndash55

Rubin D (1974) Estimating causal effects of treatments in randomized and non-randomized studiesJournal of Educational Psychology 66 688ndash701

Rubin D (2004) Direct and indirect causal effects via potential outcomes Scandinavian Journal ofStatistics 31 161ndash170

Sarafidis V amp N Weber (2015) A partially heterogeneous framework for analyzing panel data OxfordBulletin of Economics and Statistics 77 274ndash296

Sims C (1972) Money income and causality American Economic Review 62 540ndash552Song K (2009) Testing conditional independence via Rosenblatt transforms Annals of Statistics 37

4011ndash4015Strotz R amp H Wold (1960) Recursive vs nonrecursive systems An attempt at synthesis Economet-

rica 28 417ndash427Su L Z Shi amp PCB Phillips (2014) Identifying latent structures in panel data Working paper

Department of Economics Yale UniversitySu L amp H White (2007) A Consistent characteristic function-based test for conditional indepen-

dence Journal of Econometrics 141 807ndash834Su L amp H White (2008) A nonparametric Hellinger metric test for conditional independence Econo-

metric Theory 24 829ndash864Su L amp H White (2014) Testing conditional independence via empirical likelihood Journal of

Econometrics 182 27ndash44Sun Y (2005) Estimation and inference in panel structure models Working paper Department of

Economics UCSDWhite H amp K Chalak (2009) Settable systems An extension of Pearlrsquos causal model with optimiza-

tion equilibrium and learning Journal of Machine Learning Research 10 1759ndash1799White H amp K Chalak (2010) Testing a conditional form of exogeneity Economics Letters 109

88ndash90White H amp K Chalak (2013) Identification and identification failure for treatment effects using

structural systems Econometric Reviews 32 273ndash317White H K Chalak amp X Lu (2011) Linking Granger causality and the Pearl causal model with

settable systems Journal of Machine Learning Research Workshop and Conference Proceedings12 1ndash29

White H amp P Kennedy (2009) Retrospective estimation of causal effects through time In J Castleamp N Shephard (eds) The Methodology and Practice of Econometrics A Festschrift in Honour ofDavid F Hendry pp 59ndash87 Oxford University Press

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 289

White H amp X Lu (2010) Granger causality and dynamic structural systems Journal of FinancialEconometrics 8 193ndash243

White H amp X Lu (2011) Causal diagrams for treatment effect estimation with application to selectionof efficient covariates Review of Economics and Statistics 93 1453ndash1459

Wooldridge J (2005) Simple solutions to the initial conditions problem for dynamic nonlinear paneldata models with unobserved heterogeneity Journal of Applied Econometrics 20 39ndash54

Zellner A (1979) Causality and econometrics In K Brunner and AH Meltzer (eds) Three Aspectsof Policy and Policymaking Carnegie-Rochester Conference Series Vol 10 North-Holland 9ndash54

APPENDIX

Proof of Theorem 31 To show D perp U | X we establish that f (u | d x) = f (u | x)for all (ud x) We have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

= f (u x)

f (x)= f (u | x)

where the first equality in the second line holds since A2(i) (ie Dg | Xg sim D | X)implies fg(d x) = f (d | x) fg(x) say and the second holds by A2(i i) fg(u | d x) =fg(u | x) n

Proof of Theorem 32 Note that

f (u | x) = f (u x)

f (x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

We also have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | d x) fg(x)sum

gisinG pg fg(x)

The first equality in the second line follows from A2(i) Then f (u | d x) = f (u | x) forall (ud x) if and only ifsumgisinG

pg fg(u | d x) fg(x) =sumgisinG

pg fg(u | x) fg(x) for all (ud x)

By assumption (Ug | Dg Xg) g isinG is regular It follows immediately that for all g isinGDg perp Ug | Xg n

Proof of Proposition 33 Take any g isin G and let j belong to group Jg Dg rArrSYg means that Yj = r(Zj Uj bj ) Thus Uj perp Dj | X j implies that Yj perp Dj | X j byDawid (1979 Lemmas 41 and 42) This holds for all g isin G Applying Theorem 31gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

290 XUN LU ET AL

Proof of Theorem 34 We prove this by showing that Y perp D | X implies that Dg rArrSYg for all g isinG Given the assumed regularity Theorem 32 ensures that Y perp D | X impliesthat Yg perp Dg | Xg for all g isin G

Given (i) eq(33) and A2(ii) imply that for all j in group JgE

(Yj |X j = x Dj = d

) = r1(d z bj

)+ E[r2

(zUj bj

) | Dj = d X j = x]

= r1(d z bj

)+ E[r2

(zUj bj

) | X j = x]

Clearly r1(d z bj

)is constant in d if and only if E

(Yj |Dj = d X j = x

)is constant in

d Yj perp Dj | X j implies that E(Yj |Dj = d X j = x

)is constant in d which then implies

that r1(d z bj

)is constant in d Thus Dj rArrS Yj so Dg rArrS Yg

Given (i i) let ky = 1 without loss of generality Then by A2(ii) for all j in group JgFUj (u | x) = E

[1Uj le u

∣∣ X j = x]

= E[

1r(d zUj bj

) le r(d zu bj

)∣∣ X j = x]

= E[

1r(Dj Zj Uj bj

) le r(d zu bj

)∣∣ Dj = d X j = x]

= E[

1Yj le r

(d zu bj

)∣∣ Dj = d X j = x]

= Fj(r(d zu bj

) | d x)

By strict monotonicity of Fj (y | d x) in y

r(d zu bj

) = Fminus1j

(FUj (u | x) | d x

)

Now Yj perp Dj | X j implies that Fminus1j

(FUj (u | x) | d x

)is constant in d so r

(d zu bj

)is

also a constant function in d ie Dj rArrS Yj so Dg rArrS Yg As either (i) or (i i) holds for each g isin G we have Dg rArrS Yg for all g isin G n

Proof of Theorem 35 To show that A1 - A2 and the assumed regularity for Ygtogether with Dg rArrS(Xg) Yg for some g isin G imply D perp Y | X we let j in group Jgg isin G be such that Dj rArrS(X j ) Yj A2(i i) ensures that Dj perp Uj | X j so

P[Yj le y | Dj = d X j = x] =int

1r(d zu bj ) le y

d Fj (u | d x)

=int

1r(d zu bj ) le y

d Fj (u | x)

By assumption Dj rArrS(X j ) Yj so there exists ylowast isin supp(Yj ) such that there is no mea-

surable mapping f jylowast for whichint

1r(d zu bj ) le ylowastd Fj (u | d x) = f jylowast(x) ae-x Specifically this rules out the possibility thatint

1r(d zu bj ) le ylowast

d Fj (u | X j ) = f jylowast(X j ) equiv P[Yj le ylowast | X j ] as

Thus there exists ylowast isin supp(Yj ) such that

P[Yj le ylowast | Dj = d X j = x

] = P[Yj le ylowast | X j = x

]

for x in a set of positive probability so Yj perp Dj | X j Given the regularity assumed for YgTheorem 32 (for Yg) gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 291

Proof of Theorem 41 The proof is analogous to that of Theorem 31 n

Proof of Proposition 42 The proof is analogous to that of Proposition 33 n

Proof of Proposition 43 Under B3 Dgt rArrS Ygt and Dgtminus1 rArrS Ygtminus1 forallg isinGt cupGtminus1 imply that for j isin Jg there exist two measurable functions r1 and r2 such that

Yjt = r1(Uj0 bj0)+ r2(Y jtminus1 Zjt Ujt bjt

) j = 1 N t = 1

Hence

Yjt minusYjtminus1 = r2(Y jtminus1 Zjt Ujt bjt

)minus r2(Y jtminus2 Zjtminus1Ujtminus1 bjtminus1

)

At the sample level this means that

Yt minusYtminus1 = r2(Y tminus1 Zt Ut Bt

)minus r2(Y tminus2 Ztminus1Utminus1 Btminus1

)

Then B4 implies that

Dt perp (Xt Y tminus1Ut Utminus1 Bt Btminus1) | Xt Y tminus1

which further implies that Dt perp (Yt minusYtminus1) | Xt Y tminus1by Dawid (1979 Lemmas 41 and42) as (Yt minus Ytminus1) is a function of (Xt Y tminus1Ut Utminus1 Bt Btminus1) Then we have Dt perpYt | Xt Y tminus1 as Ytminus1 is a component of Y tminus1 n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

Page 16: 33 GRANGER CAUSALITY AND STRUCTURAL CAUSALITY IN … Lu_Su... · 2017. 2. 7. · 264 XUNLUETAL. natural experiments. The goal of this paper is to establish the analogous equiv-alence

278 XUN LU ET AL

By the definition of a group the same structural causality wpp relations holdfor all j in a given group We thus write Dg rArrS(Xg) Yg when Dj rArrS(X j ) Yj holdsfor all j in group Jg

Below is an example in WL in which Dj structurally causes Yj while Dj doesnot structurally cause Yj as Consider the structural equation r

Yj = r(Dj Uj

) = DjradicD2

j +1Uj1 + 1radic

D2j +1

Uj2

where Uj equiv (Uj1Uj2) and Dj are all N (01) random variables and Dj U1 j andU2 j are mutually independent For simplicity there is no bj Zj or X j It is clearthat Dj structurally causes Yj here Nevertheless Dj does not structurally causeYj as To see this note that Yj and Dj are independent Thus the LHS of eq(34)becomesint

1r(Dj u1u2

)lt y

d Fj (u1u2)

=int

1r(Dj u1u2

)lt y

d Fj

(u1u2|Dj

) = Pr[Yj lt y|Dj

] = (y)

where Fj (middot) and Fj(middot|Dj

)denote the CDF of Uj and the conditional CDF of Uj

given Dj and is the standard normal CDF Thus the RHS does not depend onDj ie Dj does not structurally cause Yj as

Theorem 35 below gives a result linking Gminuscausality and structural causalityas

THEOREM 35 Suppose A1 - A2 hold Suppose also that Yg | Dg Xgg isinG is regular If Dj structurally causes Yj wpp wrt X j (for some g isin GDj rArrS(X j ) Yj for all j in Jg) then D Gminuscauses Y wrt X ie D perp Y | X

Together Proposition 33 and Theorem 35 establish that cross-sectionGminuscausality and structural causality are essentially equivalent under the key con-ditional exogeneity assumption We say ldquoessentiallyrdquo as Theorem 35 rules outthe two ways that Gminusnon-causality can mask structural causality The first arisesfrom subtle interactions between the causal structure and the conditional distribu-tion of (Uj Dj ) given X j The second arises under heterogeneity from delicatecancellations among the conditional distributions of Yj given Dj and X j acrossgroups These exceptional possibilities only trivially mitigate the power of testsfor Gminuscausality as tests for structural causality Errors of interpretation and in-ference need not result as long as these exceptions are recognized

35 Identification and identification failure

To gain further insight into the relation between Gminuscausality and effect identi-fication we now undertake a deeper analysis of Gminuscausality in mean For this

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 279

let qg(d x) = fg(d x)sumhisinG (ph fh(d x)) Under A1 the law of iterated ex-

pectations gives

E (Y | D = d X = x) =sumgisinG

pg qg(d x)

intr(d zu bg) d Fg(u | d x)

For concreteness consider identifying average marginal effects for a continu-ous treatment d To allow both discrete and continuous Ug let νg(u | d x) definea σminusfinite measure dominating that defined by Fg(u | d x) and let fg(u | d x)define the associated Radon-Nikodym density for Ug given (Dg Xg) Assum-ing differentiability and mild regularity derivations parallel to those of White andChalak (2013) give

part

partdE(Y | D = d X = x)

=sumgisinG

pg qg(d x)

intpart

partdr(d zu bg) fg(u | d x) dνg(u | d x)

+sumgisinG

pg qg(d x)

int[r(d zu bg)

part

partdln fg(u | d x)]

times fg(u | d x) dνg(u | d x)

+sumgisinG

pgpart

partdqg(d x)

intr(d zu bg) fg(u | d x)dνg(u | d x)

After some manipulation this regression derivative can be written as

part

partdE (Y | D = d X = x) = β(d x)+ δ1(d x)+ δ2(d x) (35)

The first of the three terms on the right is a weighted average marginal effect

β(d x) =sumgisinG

pg qg(d x)βg(d x) where βg(d x)

= E[(partpartd)r(Dg ZgUg bg) | Dg = d Xg = x

]is the covariate-conditioned average marginal effect of Dj on Yj at (Dj = d X j =x) for j isin Jg g isin G Thus when the other two terms vanish the regressionderivative identifies the structurally meaningful weighted effect β(d x) andwhen β(d x) is nonzero for (d x) in a set of positive probability we haveGminuscausality in mean of D for Y wrt X When the other two terms do notvanish we generally have identification failure and there is no necessary linkbetween Gminuscausality and structural causality

The second term is a pure endogeneity bias

δ1(d x) =sumgisinG

pg qg(d x) E(εg Sg | Dg = d Xg = x

)

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

280 XUN LU ET AL

where εg = Ygminus E(Yg | Dg = d Xg = x

)is the regression residual for the given

group and Sg = (partpartd) ln fg(Ug | Dg Xg) is the exogeneity score of White andChalak (2013) Observe that E(εg Sg | Dg = d Xg = x) represents a specificform of omitted variable bias where the exogeneity score is the omitted variableA sufficient (but not necessary) condition for this bias to vanish is A2(i i) Dg perpUg | Xg for all g isin G

The third term is a pure heterogeneity bias

δ2(d x) =sumgisinG

pgpart

partdqg(d x) E

(Yg | Dg = d Xg = x

)

due to heterogeneity of fg(d | x) across members of G8 A sufficient (but notnecessary) condition for this to vanish is A2(i) Dg | Xg sim D | X for all g isin G

Thus A2 ensures that the regression derivative identifies the weighted effect

β(d x) = βlowast(d x) =sumgisinG

pg qlowastg(x)βlowast

g(d x)

where qlowastg(x) = fg(x)

sumh (ph fh(x)) and βlowast

g(d x) = int(partpartd)r(d zu bg)

d Fg(u | x) This identification provides the link between Gminuscausality in meanand structural causality

Although A2 is not necessary for effect identification cases where identifica-tion holds in the absence of A2 are quite special analogous to failures of reg-ularity Thus for practical purposes A2 can be viewed as playing the key rolein identifying effects of interest and thereby linking Gminuscausality and structuralcausality

Further as Proposition 61 and Corollary 62 of WL show in the absence ofstructural causality Gminuscausality is essentially equivalent to exogeneity failureHere this is reflected in the fact that when βg(d x) vanishes for all g A1 andmild regularity conditions give

part

partdE (Y | D = d X = x) = δ1(d x)+ δ2(d x)

Thus with structural noncausality as Gminuscausality in mean implies the fail-ure of A2(i) A2(i i) or both conversely failure of A2 essentially ensuresGminuscausality in mean

36 Testing for Gminuscausality and structural causality in cross-sectiondata

As just seen the hypothesis of Gminusnon-causality and thus of structural noncausal-ity under the conditional exogeneity assumption is a specific conditional inde-pendence In the literature there are many conditional independence tests thatapply to IID data (eg see Delgado and Gonzalez-Manteiga 2001 Fernandes and

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 281

Flores 2001 Su and White 2007 2008 2014 Song 2009 Linton and Gozalo2014 Huang Sun and White 2016) These methods can be computationallychallenging as they are nonparametric Angrist and Kuersteiner (2004 2011)develop a semi-parametric test for conditional independence Based on the factthat Y perp D | X implies ψ1(Y ) perp ψ2(D) | X for any measurable vector functions(ψ1ψ2) WL propose several convenient regression-based tests

4 STRUCTURAL CAUSALITY AND GminusCAUSALITY IN PANEL DATAThe literature contains considerable discussion about Gminuscausality in the paneldata setting Nevertheless the focus is mainly on linear or parametric modelsFor example Chamberlain (1984) discusses Gminuscausality ldquoconditional on un-observables [lsquofixed effectrsquo]rdquo in a linear model and a logit model Holtz-EakinNewey and Rosen (1988) consider a traditional linear vector autoregression(VAR) and use GMM to test Gminuscausality Nair-Reichert and Weinhold (2001)discuss Gminuscausality in a dynamic mixed fixed- and random-coefficients linearmodel allowing heterogeneity across individuals Dumitrescu and Hurlin (2012)test for Gminuscausality in a linear dynamic panel model with fixed coefficients thatvary across individuals There is also a growing literature on nonlinear and non-separable panel models (see eg Hoderlein and White (2012) and the referencestherein) We are not aware that Gminuscausality has been discussed in such modelsso far

We emphasize that in panel data Gminuscausality is also entirely based on pre-dictability and has no causal interpretation In this section we link Gminuscausalityand structural causality in general panel structures Since the static panel case issimilar to the cross-sectional case we focus on a dynamic data generating processTo simplify notation for given y isin N+ we let Y jtminus1 denote the finite historyY jtminus1 = (Yjtminusy Yjtminus1)

Assumption B1 (Panel Population DGP) Let (F P) be a complete prob-ability space let N isin N+ and let y isin N+ For the members of a populationj isin 1 N let the random vectors Yjt t = 12 be structurally determinedby a triangular system as

Yjt = r(Y jtminus1 Djt Zjt Ujt bjt

) (41)

where r is an unknown measurable ky times 1 function ky isin N+ Yjτ (τ = 1 minusy 0) Djt (kd times1kd isin N+) Zjt (kz times1 kz isin N) and Ujt (ku times1 ku isin N)are vectors of nondegenerate random variables on (F P) and bjt is a non-random real vector of dimension kb isin N Suppose also that Wjt t = 12 arerandom vectors on (F P) with dimension kw isin N The triangular structureis such that for t = 12 neither Yjsinfins=1minusy

nor Djsinfins=1 structurally deter-mine Wjt Yjsinfins=1minusy

does not structurally determine Djt Zjt or Ujt andDjsinfins=1 does not structurally determine Zjt or Ujt

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

282 XUN LU ET AL

The interpretation of structural eq(41) is the same as that for eq(31) ex-cept that here we have multiple time periods t = 12 The elements of(Yjt Djt Zjt Wjt Ujt

)and bjt rsquos can contain both time-varying and time-

invariant elements For now we do not distinguish these The elements of(Djt Zjt Wjt Ujt ) can contain lags of underlying variables By permittingonly finite histories Y jtminus1 Djt Zjt we restrict attention to Markov-type datagenerating processes this greatly simplifies the analysis and corresponds to thestructures usually considered in practice

As in the cross-section case we can have both structural heterogeneity anddistributional heterogeneity We specify the relevant structural and distributionalheterogeneities below

We again impose random sampling from the population For simplicity weassume all time periods can be observed for every individual so we observe abalanced panel Similar to the cross-section case we write

Yt = r (Y tminus1 Dt Zt Ut Bt ) t = 1 T

by suppressing the cross-sectional subscript i We assume that we observe data(Yt Dt Zt Wt ) t = 1 T and relevant lags prior to t = 1

41 Linking structural causality and Gminuscausality in dynamic panelsWe first define structural causality for the dynamic panel structure Let Djt be thesupport of Djt

DEFINITION 41 (Structural Causality in Dynamic Panels) Let j and t begiven If the function r(y jtminus1 middot zjt ujt bjt ) Djt rarr R

ky is constant on Djt

for all admissible y jtminus1 zjt and ujt then Djt does not structurally cause Yjt and we write Djt rArrS Yjt Otherwise Djt structurally causes Yjt and we writeDjt rArrS Yjt

We let X jt = (Z primejt W prime

jt )prime denote the covariates at time t WL consider

retrospective conditional exogeneity and retrospective Gminuscausality (White andKennedy 2009) We also allow this For this we letX jt denote a covariate his-tory that may contain X jt and lags or leads of X jt as in Wooldridge (2005 p 41)or WL ie for some m isin N+

X jt = SX(X jtminusm X jt X jt+m

) t = 12

where SX is a given selection matrix We assume

X jt X jtminusm subeX jt and

allow X jt also to contain leads of X jt When Xjt contains leads we have theretrospective case as in WL We denote randomly sampled covariates X t Forsimplicity we also assume that all needed elements of X t are observable Tem-poral precedence plays an important role in panel data In particular comparedwith the cross-sectional case we now have a richer set of covariates (includingboth leads and lags) to choose from

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 283

To provide a link between structural causality and Gminuscausality for dynamicpanels we condition on not only covariates X t but also lags of Yt For thiswe denote Y jtminus1 = (Yjtminusmy Yjtminus1) where my isin N+ and my ge y ie

Y jtminus1 sube Y jtminus1 Similarly we let Y tminus1 be the randomly sampled Y jtminus1Similar to the cross-section case for each t we define population group Jgt

g isin 1 t as a collection of population units having identical bjt and iden-tical distribution of (Ujt Djt ) |X jt Y jtminus1 We also let Gt = 1 t Forgiven t we write Dgt rArrS Ygt to denote Djt rArrS Yjt for all j in group Jgt g isin Gt

Next we impose an exogeneity assumption analogous to A2

Assumption B2 (Heterogeneous Dynamic Panel Exogeneity) Let t be givenFor all g isin Gt (i) Dgt |Xgt Y gtminus1 sim Dt |X t Y tminus1 and (i i) Dgt perp Ugt |Xgt Y gtminus1

Here as before for all g isin Gt the conditional distributions of Dgt given(Xgt Y gtminus1) are assumed to be identical and we write the common distributionas Dt |X t Y tminus1

The analog of Theorem 31 relating population and sample conditional exo-geneity is

THEOREM 41 Suppose B1ndashB2(i) hold and let t be given If B2(i i) holdsthen Dt perp Ut |X t Y tminus1

Proposition 42 provides a link between structural noncausality and Gminusnon-causality

PROPOSITION 42 Suppose that B1 and B2 hold Let t be given Thenstructural noncausality (Dgt rArrS Ygt forallg isin Gt ) implies that Dt does notGminuscause Yt wrtX t Y tminus1 ie Dt perp Yt |X t Y tminus1

We can also define structural causality with positive probability for dynamicstructures and show that structural causality and Gminuscausality are ldquoessentiallyrdquoequivalent under the key conditional exogeneity assumption For brevity we omitthe details

So far we have not carefully distinguished between time-varying and time-invariant elements of unobservable bjt and Ujt Below we briefly discuss thecase where we allow time-invariant components of Ujt and the randomly sampledbjt to be arbitrarily dependent or correlated with the cause of interest Djt For this we denote Ujt equiv (Ujt Uj0) where Ujt is time varying and Uj0time-invariant Similarly we denote bjt equiv (bjt bj0) The randomly sampledUjt Uj0 bjt bj0 are denoted as Ut U0 Bt B0 respectively In princi-ple we want to remove the time-invariant Uj0 and bj0 using eg the firstdifference For this it is convenient to impose a separable assumption on thestructural equation9 We also impose a conditional exogeneity assumption at thesample level

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

284 XUN LU ET AL

Assumption B3 Suppose that B1 holds with

Yjt = r1(Djt Uj0 bj0

)+r2(Y jtminus1 Djt Zjt Ujt bjt

) j = 1 N t = 1

where r1 and r2 are two unknown ky times1 measurable functions

Assumption B4 Suppose that for given t Dt perp (Ut Utminus1 Bt Btminus1

) |Xt Y tminus1

B4 is a sample-level assumption which can be supported by a correspond-ing assumption at the heterogeneous population level Note that in general B4is a strong assumption as we condition on Y tminus1 which is a function of bothDtminus1 and Utminus1 Nevertheless this assumption is plausible under the null of struc-tural noncausality as Y jtminus1 is a constant function of Djtminus1 under the nullOne simple example is that bjt rsquos are constants over j and t and Djt infint=1 perpYjt 0

t=1minusyUjt infint=1X jt infint=1 Under the null of structural noncausality for

all t Y jtminus1 is a function of Yjs0s=1minusy

Ujsts=1X jst

s=1 Then it is easyto show that in this case B4 is satisfied Certainly it will be interesting to relaxB4 but we leave this for future research

PROPOSITION 43 Suppose that B1ndashB4 hold Suppose that Dgt rArrS Ygtand Dgtminus1 rArrS Ygtminus1 forallg isin Gt cupGtminus1 Then Dt perp Yt | Xt Y tminus1

Proposition 43 suggests that when there are time-invariant components in theunobservables we can test for structural noncausality by testing for Gminusnon-causality Dt perp Yt | Xt Y tminus1 under the conditional exogeneity assumption B4

42 Testing for Gminuscausality and structural causality in panel dataAs shown above we can perform a Gminusnon causality test to test for structuralnoncausality Testing Gminusnon-causality in panel data is also simply a conditionalindependence test Here we focus on testing Dt perp Yt | Xt Y tminus1

First assume that the joint distributions of (Dt Yt Xt ) are identical over t In this case we can pool the data and implement a conditional independence testas discussed in Section 36

Second suppose that the joint distributions of (Dt Yt Xt ) are different overtime t In this case for each time period t using the cross-section data we canimplement a conditional independence test for Dt perp Yt | (Xt Y tminus1) and obtain atest statistic sayWt Thus we have T test statistics W1W2 WT and eachone can be used to test for structural noncausality for each t We may also want totest the hypothesis that for all t Djt does not structurally cause Yjt For this wecan construct our test statistic by taking the average of the T test statistics ieW equiv 1

T

sumTt=1Wt as in Im Pesaran and Shin (2003) and Dumitrescu and Hurlin

(2012) Other forms of ldquoaveragesrdquo are possible and special care is needed to takeinto account the dependence structure over time

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 285

5 CONCLUSION

This paper provides direct links between Granger causality and structural causal-ity in cross-section and panel data We extend Granger causality to cross-section and panel data and give a natural definition of structural causality inheterogeneous populations We show that under the key conditional exogeneityassumption Granger causality is essentially equivalent to structural causality incross-section and panel data Similar to the results in White and Lu (2010) fortime-series data our results here should enable researchers to avoid the misuse ofGranger causality and to establish the desired structural causal relation

NOTES

1 Strictly speaking if the conditional distributions are parameterized or treated as infinite-dimensional parameters then distributional heterogeneity may also be thought of as a special caseof structural heterogeneity

2 The classification is certainly overly simplistic We ignore the early literature on the philosophyof causality (for a review see eg Holland 1986 and Hoover 2008) We also ignore the litera-ture on causality discussed in other disciplines notably in machine learning (see eg Pearl 20092015 White and Chalak 2009 and White Chalak and Lu 2011) In particular Heckman and Pinto(2015) compare Haavelmorsquos structural framework of causality with the framework of Directed AcyclicGraphs (DAG) studied in the Bayesian network

3 Although Uj is typically called ldquounobservablerdquo it is better to view its elements as variables thatwill be omitted from empirical analysis This may be because they are unobservable it may also bebecause the researcher has purposefully or inadvertantly neglected them

4 For notational simplicity we suppress the observation subscript i5 Using our notation Holland considers the case where D perp (XU ) which implies our condi-

tional exogeneity assumption D perp U | X below6 See eg Altonji and Matzkin (2005) Hoderlein and Mammen (2007 2009) Imbens and Newey

(2009) White and Lu (2011) and White and Chalak (2013)7 AT T = E(Y (1)minusY (0) | D = 1) = E(U minus0 | D = 1) = E(U ) = 0

8 The conditional density fg(d | x) can also be interpreted as Imbensrsquos (2000) generalized propen-sity score

9 There is a recent literature which allows time-invariant unobservables to be arbitrarily correlatedwith causes of interest in the general nonseparable models (see eg Evdokimov 2010) We leave theproblem of linking G-causality and structural causality in this general setting for future research

REFERENCES

Altonji J amp R Matzkin (2005) Cross section and panel data estimators for nonseparable models withendogenous regressors Econometrica 73 1053ndash1102

Angrist J K Graddy amp G Imbens (2000) The interpretation of instrumental variables estimatorsin simultaneous equations models with an application to the demand for fish Review of EconomicStudies 67 499ndash527

Angrist J O Jorda amp G Kuersteiner (2013) Semiparametric estimates of monetary policy effectsstring theory revisited NBER Working Paper 19355

Angrist J amp G Kuersteiner (2004) Semiparametric causality tests using the policy propensity scoreNBER Working Paper No 10975

Angrist J amp G Kuersteiner (2011) Causal effects of monetary shocks semiparametric conditionalindependence tests with a multinomial propensity score Review of Economics and Statistics 93725ndash747

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

286 XUN LU ET AL

Angrist J amp J Pischke (2009) Mostly Harmless Econometrics An Empiricistrsquos Companion PrincetonUniversity Press

Bester CA amp CB Hansen (2016) Grouped effects estimators in fixed effects models Journal ofEconometrics 190 197ndash208

Bonhomme S amp E Manresa (2015) Grouped patterns of heterogeneity in panel data Econometrica83 1147ndash1184

Browning M amp JM Carro (2007) Heterogeneity and microeconometrics modelling In R BlundellW K Newey and T Persson (Eds) Advances in Economics and Econometrics Theory andApplications Ninth World Congress of the Econometric Society Volume 3 pp 45ndash74 CambridgeUniversity Press

Browning M amp JM Carro (2010) Heterogeneity in dynamic discrete choice models EconometricsJournal 13 1ndash39

Burda M M Harding amp J Hausman (2015) A Bayesian semi-parametric competing risk model withunobserved heterogeneity Journal of Applied Econometrics 30 353ndash376

Chalak K amp H White (2011) An extended class of instrumental variables for the estimation of causaleffects Canadian Journal of Economics 44 1ndash51

Chalak K amp H White (2012) Causality conditional independence and graphical separation insettable systems Neural Computation 24 1611ndash1668

Chamberlain G (1982) The general equivalence of Granger and Sims causality Econometrica 50569ndash581

Chamberlain G (1984) Panel data In Z Griliches amp M Intriligator (eds) Handbook of Economet-rics vol2 ch 22 pp1247ndash1318 Elsevier

Chesher A (2003) Identification in nonseparable models Econometrica 71 1405ndash1441Chesher A (2005) Nonparametric identification under discrete variation Econometrica 73

1525ndash1550Dawid AP (1979) Conditional independence in statistical theory Journal of the Royal Statistical

Society Series B 41 1ndash31Deb P amp PK Trivedi (2013) Finite mixture for panels with fixed effects Journal of Econometric

Methods 2 35ndash51Delgado M amp W Gonzalez-Manteiga (2001) Significance testing in nonparametric regression based

on the bootstrap Annals of Statistics 29 1469ndash1507Dumitrescu E-I amp C Hurlin (2012) Testing for Granger non-causality in heterogeneous panels

Economic Modelling 29 1450ndash1460Evdokimov K (2010) Identification and estimation of a nonparametric panel data model with unob-

served heterogeneity Working paper Department of Economics Princeton UniversityFlorens JP (2003) Some technical issues in defining causality Journal of Econometrics 112

127ndash128Florens JP amp D Fougere (1996) Non-causality in continuous time Econometrica 64 1195ndash1212Florens JP amp M Mouchart (1982) A note on non-causality Econometrica 50 583ndash591Granger CWJ (1969) Investigating causal relations by econometric models and cross-spectral

methods Econometrica 37 424ndash438Granger CWJ (1980) Testing for causality a personal viewpoint Journal of Economic Dynamics

and Control 2 329ndash352Granger CWJ (1988) Some recent developments in a concept of causality Journal of Econometrics

39 199ndash211Granger CWJ amp P Newbold (1986) Forecasting Economic Time Series Academic Press 2nd

editionHausman J amp T Woutersen (2014) Estimating a semi-parametric duration model without specifying

heterogeneity Journal of Econometrics 178 114ndash131Haavelmo T (1943) The statistical implications of a system of simultaneous equations Econometrica

11 1ndash12Haavelmo T (1944) The probability approach in econometrics Econometrica 12 (supp) iii-vi

1ndash115

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 287

Hahn J (1998) On the role of the propensity score in efficient semiparametric estimation of averagetreatment effect Econometrica 66 315ndash331

Hahn J amp R Moon (2010) Panel data models with finite number of multiple equilibria EconometricTheory 26 863ndash881

Heckman JJ (2000) Causal parameters and policy analysis in economics a twentieth century retro-spective Quarterly Journal of Economics 115 45ndash97

Heckman JJ (2008) Econometric causality International Statistical Review 76 1ndash27Heckman JJ amp R Pinto (2015) Causal analysis after Haavelmo Econometric Theory 31

115ndash151Hirano K G Imbens amp G Ridder (2003) Efficient estimation of average treatment effects using the

estimated propensity score Econometrica 71 1161ndash1189Hoderlein S amp E Mammen (2007) Identification of marginal effects in nonseparable models without

monotonicity Econometrica 75 1513ndash1519Hoderlein S amp E Mammen (2009) Identification and estimation of local average derivatives in non-

separable models without monotonicity Econometrics Journal 12 1ndash25Hoderlein S amp H White (2012) Nonparametric identification in nonseparable panel data models with

generalized fixed effects Journal of Econometrics 168 300ndash314Holland P (1986) Statistics and causal inference Journal of the American Statistical Association 81

945ndash970Holtz-Eakin D W Newey amp HS Rosen (1988) Estimating vector autoregressions with panel data

Econometrica 56 1371ndash1396Hoover K (2008) Causality in economics and econometrics In S Durlauf and L Blume (eds)

The New Palgrave Dictionary of Economics 2nd edition Palgrave MacmillanHsiao C (2014) Analysis of Panel Data Cambridge University PressHuang M Y Sun amp H White (2016) A flexible nonparametric test for conditional independence

Econometric Theory forthcomingHurwicz L (1950) Generalization of the concept of identification In TC Koopmans (eds) Statistical

Inference in Dynamic Economic Models pp 238ndash257 John WileyIm K H Pesaran amp Y Shin (2003) Testing for unit roots in heterogeneous panels Journal of Econo-

metrics 115 53ndash74Imbens G (2000) The role of the propensity score in estimating dose-response functions Biometrika

87 706ndash710Imbens G (2004) Nonparametric estimation of average treatment effects under exogeneity a review

Review of Economics and Statistics 86 4ndash29Imbens G amp W Newey (2009) Identification and estimation of triangular simultaneous equations

models without additivity Econometrica 77 1481ndash1512Imbens G amp J Wooldridge (2009) Recent developments in the econometrics of program evaluation

Journal of Economic Literature 47 5ndash86Kasy M (2011) Identification in triangular systems using control functions Econometric Theory 27

663ndash671Koopmans T (1950) Statistical Inference in Dynamic Economic Models Cowles Commission Mono-

graph No 10 WileyKuersteiner G (2008) GrangerndashSims causality In S Durlauf and L Blume (Eds) The New Palgrave

Dictionary of Economics 2nd edition Palgrave MacmillanLechner M (2001) Identification and estimation of causal effect of multiple treatments under con-

ditional independence assumption In M Lechner F Pfeiller (eds) Econometric Evaluations ofLabour Market Policies Hildelberg Physical 43ndash58

Lechner M (2011) The relation of different concepts of causality used in time series and microecono-metrics Econometric Reviews 30 109ndash127

Lin C-C amp S Ng (2012) Estimation of panel data models with parameter heterogeneity when groupmembership is unknown Journal of Econometric Methods 1 42ndash55

Lindsay B (1995) Mixture Models Theory Geometry and Applications Hayward Institute for Math-ematical Statistics

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

288 XUN LU ET AL

Linton O amp P Gozalo (2014) Testing conditional independence restrictions Econometric Reviews33 523ndash552

Lu X amp L Su (2014) Determining the number of groups in latent panel structures Working paperSchool of Economics Singapore Management University

Matzkin R (2003) Nonparametric estimation of nonadditive random functions Econometrica 711339ndash1375

Matzkin R (2007) Nonparametric identification In JJ Heckman and EE Leamer (eds) The Hand-book of Econometrics 6B North-Holland

Mclachlan G amp K Basford (1988) Mixture Models Inference and Applications to Clustering MarcelDekker

Nair-Reichert U amp D Weinhold (2001) Causality tests for cross-country panels a new look at FDIand economic growth in developing countries Oxford Bulletin of Economics and Statistics 63153-171

Pearl J (2009) Causality Models Reasoning and Inference (2nd edition) Cambridge UniversityPress

Pearl J (2015) Trygve Haavelmo and the emergence of causal calculus Econometric Theory 31152ndash179

Rosenbaum P amp D Rubin (1983) The central role of the propensity score in observational studies forcausal effects Biometrika 70 41ndash55

Rubin D (1974) Estimating causal effects of treatments in randomized and non-randomized studiesJournal of Educational Psychology 66 688ndash701

Rubin D (2004) Direct and indirect causal effects via potential outcomes Scandinavian Journal ofStatistics 31 161ndash170

Sarafidis V amp N Weber (2015) A partially heterogeneous framework for analyzing panel data OxfordBulletin of Economics and Statistics 77 274ndash296

Sims C (1972) Money income and causality American Economic Review 62 540ndash552Song K (2009) Testing conditional independence via Rosenblatt transforms Annals of Statistics 37

4011ndash4015Strotz R amp H Wold (1960) Recursive vs nonrecursive systems An attempt at synthesis Economet-

rica 28 417ndash427Su L Z Shi amp PCB Phillips (2014) Identifying latent structures in panel data Working paper

Department of Economics Yale UniversitySu L amp H White (2007) A Consistent characteristic function-based test for conditional indepen-

dence Journal of Econometrics 141 807ndash834Su L amp H White (2008) A nonparametric Hellinger metric test for conditional independence Econo-

metric Theory 24 829ndash864Su L amp H White (2014) Testing conditional independence via empirical likelihood Journal of

Econometrics 182 27ndash44Sun Y (2005) Estimation and inference in panel structure models Working paper Department of

Economics UCSDWhite H amp K Chalak (2009) Settable systems An extension of Pearlrsquos causal model with optimiza-

tion equilibrium and learning Journal of Machine Learning Research 10 1759ndash1799White H amp K Chalak (2010) Testing a conditional form of exogeneity Economics Letters 109

88ndash90White H amp K Chalak (2013) Identification and identification failure for treatment effects using

structural systems Econometric Reviews 32 273ndash317White H K Chalak amp X Lu (2011) Linking Granger causality and the Pearl causal model with

settable systems Journal of Machine Learning Research Workshop and Conference Proceedings12 1ndash29

White H amp P Kennedy (2009) Retrospective estimation of causal effects through time In J Castleamp N Shephard (eds) The Methodology and Practice of Econometrics A Festschrift in Honour ofDavid F Hendry pp 59ndash87 Oxford University Press

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 289

White H amp X Lu (2010) Granger causality and dynamic structural systems Journal of FinancialEconometrics 8 193ndash243

White H amp X Lu (2011) Causal diagrams for treatment effect estimation with application to selectionof efficient covariates Review of Economics and Statistics 93 1453ndash1459

Wooldridge J (2005) Simple solutions to the initial conditions problem for dynamic nonlinear paneldata models with unobserved heterogeneity Journal of Applied Econometrics 20 39ndash54

Zellner A (1979) Causality and econometrics In K Brunner and AH Meltzer (eds) Three Aspectsof Policy and Policymaking Carnegie-Rochester Conference Series Vol 10 North-Holland 9ndash54

APPENDIX

Proof of Theorem 31 To show D perp U | X we establish that f (u | d x) = f (u | x)for all (ud x) We have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

= f (u x)

f (x)= f (u | x)

where the first equality in the second line holds since A2(i) (ie Dg | Xg sim D | X)implies fg(d x) = f (d | x) fg(x) say and the second holds by A2(i i) fg(u | d x) =fg(u | x) n

Proof of Theorem 32 Note that

f (u | x) = f (u x)

f (x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

We also have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | d x) fg(x)sum

gisinG pg fg(x)

The first equality in the second line follows from A2(i) Then f (u | d x) = f (u | x) forall (ud x) if and only ifsumgisinG

pg fg(u | d x) fg(x) =sumgisinG

pg fg(u | x) fg(x) for all (ud x)

By assumption (Ug | Dg Xg) g isinG is regular It follows immediately that for all g isinGDg perp Ug | Xg n

Proof of Proposition 33 Take any g isin G and let j belong to group Jg Dg rArrSYg means that Yj = r(Zj Uj bj ) Thus Uj perp Dj | X j implies that Yj perp Dj | X j byDawid (1979 Lemmas 41 and 42) This holds for all g isin G Applying Theorem 31gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

290 XUN LU ET AL

Proof of Theorem 34 We prove this by showing that Y perp D | X implies that Dg rArrSYg for all g isinG Given the assumed regularity Theorem 32 ensures that Y perp D | X impliesthat Yg perp Dg | Xg for all g isin G

Given (i) eq(33) and A2(ii) imply that for all j in group JgE

(Yj |X j = x Dj = d

) = r1(d z bj

)+ E[r2

(zUj bj

) | Dj = d X j = x]

= r1(d z bj

)+ E[r2

(zUj bj

) | X j = x]

Clearly r1(d z bj

)is constant in d if and only if E

(Yj |Dj = d X j = x

)is constant in

d Yj perp Dj | X j implies that E(Yj |Dj = d X j = x

)is constant in d which then implies

that r1(d z bj

)is constant in d Thus Dj rArrS Yj so Dg rArrS Yg

Given (i i) let ky = 1 without loss of generality Then by A2(ii) for all j in group JgFUj (u | x) = E

[1Uj le u

∣∣ X j = x]

= E[

1r(d zUj bj

) le r(d zu bj

)∣∣ X j = x]

= E[

1r(Dj Zj Uj bj

) le r(d zu bj

)∣∣ Dj = d X j = x]

= E[

1Yj le r

(d zu bj

)∣∣ Dj = d X j = x]

= Fj(r(d zu bj

) | d x)

By strict monotonicity of Fj (y | d x) in y

r(d zu bj

) = Fminus1j

(FUj (u | x) | d x

)

Now Yj perp Dj | X j implies that Fminus1j

(FUj (u | x) | d x

)is constant in d so r

(d zu bj

)is

also a constant function in d ie Dj rArrS Yj so Dg rArrS Yg As either (i) or (i i) holds for each g isin G we have Dg rArrS Yg for all g isin G n

Proof of Theorem 35 To show that A1 - A2 and the assumed regularity for Ygtogether with Dg rArrS(Xg) Yg for some g isin G imply D perp Y | X we let j in group Jgg isin G be such that Dj rArrS(X j ) Yj A2(i i) ensures that Dj perp Uj | X j so

P[Yj le y | Dj = d X j = x] =int

1r(d zu bj ) le y

d Fj (u | d x)

=int

1r(d zu bj ) le y

d Fj (u | x)

By assumption Dj rArrS(X j ) Yj so there exists ylowast isin supp(Yj ) such that there is no mea-

surable mapping f jylowast for whichint

1r(d zu bj ) le ylowastd Fj (u | d x) = f jylowast(x) ae-x Specifically this rules out the possibility thatint

1r(d zu bj ) le ylowast

d Fj (u | X j ) = f jylowast(X j ) equiv P[Yj le ylowast | X j ] as

Thus there exists ylowast isin supp(Yj ) such that

P[Yj le ylowast | Dj = d X j = x

] = P[Yj le ylowast | X j = x

]

for x in a set of positive probability so Yj perp Dj | X j Given the regularity assumed for YgTheorem 32 (for Yg) gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 291

Proof of Theorem 41 The proof is analogous to that of Theorem 31 n

Proof of Proposition 42 The proof is analogous to that of Proposition 33 n

Proof of Proposition 43 Under B3 Dgt rArrS Ygt and Dgtminus1 rArrS Ygtminus1 forallg isinGt cupGtminus1 imply that for j isin Jg there exist two measurable functions r1 and r2 such that

Yjt = r1(Uj0 bj0)+ r2(Y jtminus1 Zjt Ujt bjt

) j = 1 N t = 1

Hence

Yjt minusYjtminus1 = r2(Y jtminus1 Zjt Ujt bjt

)minus r2(Y jtminus2 Zjtminus1Ujtminus1 bjtminus1

)

At the sample level this means that

Yt minusYtminus1 = r2(Y tminus1 Zt Ut Bt

)minus r2(Y tminus2 Ztminus1Utminus1 Btminus1

)

Then B4 implies that

Dt perp (Xt Y tminus1Ut Utminus1 Bt Btminus1) | Xt Y tminus1

which further implies that Dt perp (Yt minusYtminus1) | Xt Y tminus1by Dawid (1979 Lemmas 41 and42) as (Yt minus Ytminus1) is a function of (Xt Y tminus1Ut Utminus1 Bt Btminus1) Then we have Dt perpYt | Xt Y tminus1 as Ytminus1 is a component of Y tminus1 n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

Page 17: 33 GRANGER CAUSALITY AND STRUCTURAL CAUSALITY IN … Lu_Su... · 2017. 2. 7. · 264 XUNLUETAL. natural experiments. The goal of this paper is to establish the analogous equiv-alence

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 279

let qg(d x) = fg(d x)sumhisinG (ph fh(d x)) Under A1 the law of iterated ex-

pectations gives

E (Y | D = d X = x) =sumgisinG

pg qg(d x)

intr(d zu bg) d Fg(u | d x)

For concreteness consider identifying average marginal effects for a continu-ous treatment d To allow both discrete and continuous Ug let νg(u | d x) definea σminusfinite measure dominating that defined by Fg(u | d x) and let fg(u | d x)define the associated Radon-Nikodym density for Ug given (Dg Xg) Assum-ing differentiability and mild regularity derivations parallel to those of White andChalak (2013) give

part

partdE(Y | D = d X = x)

=sumgisinG

pg qg(d x)

intpart

partdr(d zu bg) fg(u | d x) dνg(u | d x)

+sumgisinG

pg qg(d x)

int[r(d zu bg)

part

partdln fg(u | d x)]

times fg(u | d x) dνg(u | d x)

+sumgisinG

pgpart

partdqg(d x)

intr(d zu bg) fg(u | d x)dνg(u | d x)

After some manipulation this regression derivative can be written as

part

partdE (Y | D = d X = x) = β(d x)+ δ1(d x)+ δ2(d x) (35)

The first of the three terms on the right is a weighted average marginal effect

β(d x) =sumgisinG

pg qg(d x)βg(d x) where βg(d x)

= E[(partpartd)r(Dg ZgUg bg) | Dg = d Xg = x

]is the covariate-conditioned average marginal effect of Dj on Yj at (Dj = d X j =x) for j isin Jg g isin G Thus when the other two terms vanish the regressionderivative identifies the structurally meaningful weighted effect β(d x) andwhen β(d x) is nonzero for (d x) in a set of positive probability we haveGminuscausality in mean of D for Y wrt X When the other two terms do notvanish we generally have identification failure and there is no necessary linkbetween Gminuscausality and structural causality

The second term is a pure endogeneity bias

δ1(d x) =sumgisinG

pg qg(d x) E(εg Sg | Dg = d Xg = x

)

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

280 XUN LU ET AL

where εg = Ygminus E(Yg | Dg = d Xg = x

)is the regression residual for the given

group and Sg = (partpartd) ln fg(Ug | Dg Xg) is the exogeneity score of White andChalak (2013) Observe that E(εg Sg | Dg = d Xg = x) represents a specificform of omitted variable bias where the exogeneity score is the omitted variableA sufficient (but not necessary) condition for this bias to vanish is A2(i i) Dg perpUg | Xg for all g isin G

The third term is a pure heterogeneity bias

δ2(d x) =sumgisinG

pgpart

partdqg(d x) E

(Yg | Dg = d Xg = x

)

due to heterogeneity of fg(d | x) across members of G8 A sufficient (but notnecessary) condition for this to vanish is A2(i) Dg | Xg sim D | X for all g isin G

Thus A2 ensures that the regression derivative identifies the weighted effect

β(d x) = βlowast(d x) =sumgisinG

pg qlowastg(x)βlowast

g(d x)

where qlowastg(x) = fg(x)

sumh (ph fh(x)) and βlowast

g(d x) = int(partpartd)r(d zu bg)

d Fg(u | x) This identification provides the link between Gminuscausality in meanand structural causality

Although A2 is not necessary for effect identification cases where identifica-tion holds in the absence of A2 are quite special analogous to failures of reg-ularity Thus for practical purposes A2 can be viewed as playing the key rolein identifying effects of interest and thereby linking Gminuscausality and structuralcausality

Further as Proposition 61 and Corollary 62 of WL show in the absence ofstructural causality Gminuscausality is essentially equivalent to exogeneity failureHere this is reflected in the fact that when βg(d x) vanishes for all g A1 andmild regularity conditions give

part

partdE (Y | D = d X = x) = δ1(d x)+ δ2(d x)

Thus with structural noncausality as Gminuscausality in mean implies the fail-ure of A2(i) A2(i i) or both conversely failure of A2 essentially ensuresGminuscausality in mean

36 Testing for Gminuscausality and structural causality in cross-sectiondata

As just seen the hypothesis of Gminusnon-causality and thus of structural noncausal-ity under the conditional exogeneity assumption is a specific conditional inde-pendence In the literature there are many conditional independence tests thatapply to IID data (eg see Delgado and Gonzalez-Manteiga 2001 Fernandes and

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 281

Flores 2001 Su and White 2007 2008 2014 Song 2009 Linton and Gozalo2014 Huang Sun and White 2016) These methods can be computationallychallenging as they are nonparametric Angrist and Kuersteiner (2004 2011)develop a semi-parametric test for conditional independence Based on the factthat Y perp D | X implies ψ1(Y ) perp ψ2(D) | X for any measurable vector functions(ψ1ψ2) WL propose several convenient regression-based tests

4 STRUCTURAL CAUSALITY AND GminusCAUSALITY IN PANEL DATAThe literature contains considerable discussion about Gminuscausality in the paneldata setting Nevertheless the focus is mainly on linear or parametric modelsFor example Chamberlain (1984) discusses Gminuscausality ldquoconditional on un-observables [lsquofixed effectrsquo]rdquo in a linear model and a logit model Holtz-EakinNewey and Rosen (1988) consider a traditional linear vector autoregression(VAR) and use GMM to test Gminuscausality Nair-Reichert and Weinhold (2001)discuss Gminuscausality in a dynamic mixed fixed- and random-coefficients linearmodel allowing heterogeneity across individuals Dumitrescu and Hurlin (2012)test for Gminuscausality in a linear dynamic panel model with fixed coefficients thatvary across individuals There is also a growing literature on nonlinear and non-separable panel models (see eg Hoderlein and White (2012) and the referencestherein) We are not aware that Gminuscausality has been discussed in such modelsso far

We emphasize that in panel data Gminuscausality is also entirely based on pre-dictability and has no causal interpretation In this section we link Gminuscausalityand structural causality in general panel structures Since the static panel case issimilar to the cross-sectional case we focus on a dynamic data generating processTo simplify notation for given y isin N+ we let Y jtminus1 denote the finite historyY jtminus1 = (Yjtminusy Yjtminus1)

Assumption B1 (Panel Population DGP) Let (F P) be a complete prob-ability space let N isin N+ and let y isin N+ For the members of a populationj isin 1 N let the random vectors Yjt t = 12 be structurally determinedby a triangular system as

Yjt = r(Y jtminus1 Djt Zjt Ujt bjt

) (41)

where r is an unknown measurable ky times 1 function ky isin N+ Yjτ (τ = 1 minusy 0) Djt (kd times1kd isin N+) Zjt (kz times1 kz isin N) and Ujt (ku times1 ku isin N)are vectors of nondegenerate random variables on (F P) and bjt is a non-random real vector of dimension kb isin N Suppose also that Wjt t = 12 arerandom vectors on (F P) with dimension kw isin N The triangular structureis such that for t = 12 neither Yjsinfins=1minusy

nor Djsinfins=1 structurally deter-mine Wjt Yjsinfins=1minusy

does not structurally determine Djt Zjt or Ujt andDjsinfins=1 does not structurally determine Zjt or Ujt

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

282 XUN LU ET AL

The interpretation of structural eq(41) is the same as that for eq(31) ex-cept that here we have multiple time periods t = 12 The elements of(Yjt Djt Zjt Wjt Ujt

)and bjt rsquos can contain both time-varying and time-

invariant elements For now we do not distinguish these The elements of(Djt Zjt Wjt Ujt ) can contain lags of underlying variables By permittingonly finite histories Y jtminus1 Djt Zjt we restrict attention to Markov-type datagenerating processes this greatly simplifies the analysis and corresponds to thestructures usually considered in practice

As in the cross-section case we can have both structural heterogeneity anddistributional heterogeneity We specify the relevant structural and distributionalheterogeneities below

We again impose random sampling from the population For simplicity weassume all time periods can be observed for every individual so we observe abalanced panel Similar to the cross-section case we write

Yt = r (Y tminus1 Dt Zt Ut Bt ) t = 1 T

by suppressing the cross-sectional subscript i We assume that we observe data(Yt Dt Zt Wt ) t = 1 T and relevant lags prior to t = 1

41 Linking structural causality and Gminuscausality in dynamic panelsWe first define structural causality for the dynamic panel structure Let Djt be thesupport of Djt

DEFINITION 41 (Structural Causality in Dynamic Panels) Let j and t begiven If the function r(y jtminus1 middot zjt ujt bjt ) Djt rarr R

ky is constant on Djt

for all admissible y jtminus1 zjt and ujt then Djt does not structurally cause Yjt and we write Djt rArrS Yjt Otherwise Djt structurally causes Yjt and we writeDjt rArrS Yjt

We let X jt = (Z primejt W prime

jt )prime denote the covariates at time t WL consider

retrospective conditional exogeneity and retrospective Gminuscausality (White andKennedy 2009) We also allow this For this we letX jt denote a covariate his-tory that may contain X jt and lags or leads of X jt as in Wooldridge (2005 p 41)or WL ie for some m isin N+

X jt = SX(X jtminusm X jt X jt+m

) t = 12

where SX is a given selection matrix We assume

X jt X jtminusm subeX jt and

allow X jt also to contain leads of X jt When Xjt contains leads we have theretrospective case as in WL We denote randomly sampled covariates X t Forsimplicity we also assume that all needed elements of X t are observable Tem-poral precedence plays an important role in panel data In particular comparedwith the cross-sectional case we now have a richer set of covariates (includingboth leads and lags) to choose from

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 283

To provide a link between structural causality and Gminuscausality for dynamicpanels we condition on not only covariates X t but also lags of Yt For thiswe denote Y jtminus1 = (Yjtminusmy Yjtminus1) where my isin N+ and my ge y ie

Y jtminus1 sube Y jtminus1 Similarly we let Y tminus1 be the randomly sampled Y jtminus1Similar to the cross-section case for each t we define population group Jgt

g isin 1 t as a collection of population units having identical bjt and iden-tical distribution of (Ujt Djt ) |X jt Y jtminus1 We also let Gt = 1 t Forgiven t we write Dgt rArrS Ygt to denote Djt rArrS Yjt for all j in group Jgt g isin Gt

Next we impose an exogeneity assumption analogous to A2

Assumption B2 (Heterogeneous Dynamic Panel Exogeneity) Let t be givenFor all g isin Gt (i) Dgt |Xgt Y gtminus1 sim Dt |X t Y tminus1 and (i i) Dgt perp Ugt |Xgt Y gtminus1

Here as before for all g isin Gt the conditional distributions of Dgt given(Xgt Y gtminus1) are assumed to be identical and we write the common distributionas Dt |X t Y tminus1

The analog of Theorem 31 relating population and sample conditional exo-geneity is

THEOREM 41 Suppose B1ndashB2(i) hold and let t be given If B2(i i) holdsthen Dt perp Ut |X t Y tminus1

Proposition 42 provides a link between structural noncausality and Gminusnon-causality

PROPOSITION 42 Suppose that B1 and B2 hold Let t be given Thenstructural noncausality (Dgt rArrS Ygt forallg isin Gt ) implies that Dt does notGminuscause Yt wrtX t Y tminus1 ie Dt perp Yt |X t Y tminus1

We can also define structural causality with positive probability for dynamicstructures and show that structural causality and Gminuscausality are ldquoessentiallyrdquoequivalent under the key conditional exogeneity assumption For brevity we omitthe details

So far we have not carefully distinguished between time-varying and time-invariant elements of unobservable bjt and Ujt Below we briefly discuss thecase where we allow time-invariant components of Ujt and the randomly sampledbjt to be arbitrarily dependent or correlated with the cause of interest Djt For this we denote Ujt equiv (Ujt Uj0) where Ujt is time varying and Uj0time-invariant Similarly we denote bjt equiv (bjt bj0) The randomly sampledUjt Uj0 bjt bj0 are denoted as Ut U0 Bt B0 respectively In princi-ple we want to remove the time-invariant Uj0 and bj0 using eg the firstdifference For this it is convenient to impose a separable assumption on thestructural equation9 We also impose a conditional exogeneity assumption at thesample level

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

284 XUN LU ET AL

Assumption B3 Suppose that B1 holds with

Yjt = r1(Djt Uj0 bj0

)+r2(Y jtminus1 Djt Zjt Ujt bjt

) j = 1 N t = 1

where r1 and r2 are two unknown ky times1 measurable functions

Assumption B4 Suppose that for given t Dt perp (Ut Utminus1 Bt Btminus1

) |Xt Y tminus1

B4 is a sample-level assumption which can be supported by a correspond-ing assumption at the heterogeneous population level Note that in general B4is a strong assumption as we condition on Y tminus1 which is a function of bothDtminus1 and Utminus1 Nevertheless this assumption is plausible under the null of struc-tural noncausality as Y jtminus1 is a constant function of Djtminus1 under the nullOne simple example is that bjt rsquos are constants over j and t and Djt infint=1 perpYjt 0

t=1minusyUjt infint=1X jt infint=1 Under the null of structural noncausality for

all t Y jtminus1 is a function of Yjs0s=1minusy

Ujsts=1X jst

s=1 Then it is easyto show that in this case B4 is satisfied Certainly it will be interesting to relaxB4 but we leave this for future research

PROPOSITION 43 Suppose that B1ndashB4 hold Suppose that Dgt rArrS Ygtand Dgtminus1 rArrS Ygtminus1 forallg isin Gt cupGtminus1 Then Dt perp Yt | Xt Y tminus1

Proposition 43 suggests that when there are time-invariant components in theunobservables we can test for structural noncausality by testing for Gminusnon-causality Dt perp Yt | Xt Y tminus1 under the conditional exogeneity assumption B4

42 Testing for Gminuscausality and structural causality in panel dataAs shown above we can perform a Gminusnon causality test to test for structuralnoncausality Testing Gminusnon-causality in panel data is also simply a conditionalindependence test Here we focus on testing Dt perp Yt | Xt Y tminus1

First assume that the joint distributions of (Dt Yt Xt ) are identical over t In this case we can pool the data and implement a conditional independence testas discussed in Section 36

Second suppose that the joint distributions of (Dt Yt Xt ) are different overtime t In this case for each time period t using the cross-section data we canimplement a conditional independence test for Dt perp Yt | (Xt Y tminus1) and obtain atest statistic sayWt Thus we have T test statistics W1W2 WT and eachone can be used to test for structural noncausality for each t We may also want totest the hypothesis that for all t Djt does not structurally cause Yjt For this wecan construct our test statistic by taking the average of the T test statistics ieW equiv 1

T

sumTt=1Wt as in Im Pesaran and Shin (2003) and Dumitrescu and Hurlin

(2012) Other forms of ldquoaveragesrdquo are possible and special care is needed to takeinto account the dependence structure over time

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 285

5 CONCLUSION

This paper provides direct links between Granger causality and structural causal-ity in cross-section and panel data We extend Granger causality to cross-section and panel data and give a natural definition of structural causality inheterogeneous populations We show that under the key conditional exogeneityassumption Granger causality is essentially equivalent to structural causality incross-section and panel data Similar to the results in White and Lu (2010) fortime-series data our results here should enable researchers to avoid the misuse ofGranger causality and to establish the desired structural causal relation

NOTES

1 Strictly speaking if the conditional distributions are parameterized or treated as infinite-dimensional parameters then distributional heterogeneity may also be thought of as a special caseof structural heterogeneity

2 The classification is certainly overly simplistic We ignore the early literature on the philosophyof causality (for a review see eg Holland 1986 and Hoover 2008) We also ignore the litera-ture on causality discussed in other disciplines notably in machine learning (see eg Pearl 20092015 White and Chalak 2009 and White Chalak and Lu 2011) In particular Heckman and Pinto(2015) compare Haavelmorsquos structural framework of causality with the framework of Directed AcyclicGraphs (DAG) studied in the Bayesian network

3 Although Uj is typically called ldquounobservablerdquo it is better to view its elements as variables thatwill be omitted from empirical analysis This may be because they are unobservable it may also bebecause the researcher has purposefully or inadvertantly neglected them

4 For notational simplicity we suppress the observation subscript i5 Using our notation Holland considers the case where D perp (XU ) which implies our condi-

tional exogeneity assumption D perp U | X below6 See eg Altonji and Matzkin (2005) Hoderlein and Mammen (2007 2009) Imbens and Newey

(2009) White and Lu (2011) and White and Chalak (2013)7 AT T = E(Y (1)minusY (0) | D = 1) = E(U minus0 | D = 1) = E(U ) = 0

8 The conditional density fg(d | x) can also be interpreted as Imbensrsquos (2000) generalized propen-sity score

9 There is a recent literature which allows time-invariant unobservables to be arbitrarily correlatedwith causes of interest in the general nonseparable models (see eg Evdokimov 2010) We leave theproblem of linking G-causality and structural causality in this general setting for future research

REFERENCES

Altonji J amp R Matzkin (2005) Cross section and panel data estimators for nonseparable models withendogenous regressors Econometrica 73 1053ndash1102

Angrist J K Graddy amp G Imbens (2000) The interpretation of instrumental variables estimatorsin simultaneous equations models with an application to the demand for fish Review of EconomicStudies 67 499ndash527

Angrist J O Jorda amp G Kuersteiner (2013) Semiparametric estimates of monetary policy effectsstring theory revisited NBER Working Paper 19355

Angrist J amp G Kuersteiner (2004) Semiparametric causality tests using the policy propensity scoreNBER Working Paper No 10975

Angrist J amp G Kuersteiner (2011) Causal effects of monetary shocks semiparametric conditionalindependence tests with a multinomial propensity score Review of Economics and Statistics 93725ndash747

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

286 XUN LU ET AL

Angrist J amp J Pischke (2009) Mostly Harmless Econometrics An Empiricistrsquos Companion PrincetonUniversity Press

Bester CA amp CB Hansen (2016) Grouped effects estimators in fixed effects models Journal ofEconometrics 190 197ndash208

Bonhomme S amp E Manresa (2015) Grouped patterns of heterogeneity in panel data Econometrica83 1147ndash1184

Browning M amp JM Carro (2007) Heterogeneity and microeconometrics modelling In R BlundellW K Newey and T Persson (Eds) Advances in Economics and Econometrics Theory andApplications Ninth World Congress of the Econometric Society Volume 3 pp 45ndash74 CambridgeUniversity Press

Browning M amp JM Carro (2010) Heterogeneity in dynamic discrete choice models EconometricsJournal 13 1ndash39

Burda M M Harding amp J Hausman (2015) A Bayesian semi-parametric competing risk model withunobserved heterogeneity Journal of Applied Econometrics 30 353ndash376

Chalak K amp H White (2011) An extended class of instrumental variables for the estimation of causaleffects Canadian Journal of Economics 44 1ndash51

Chalak K amp H White (2012) Causality conditional independence and graphical separation insettable systems Neural Computation 24 1611ndash1668

Chamberlain G (1982) The general equivalence of Granger and Sims causality Econometrica 50569ndash581

Chamberlain G (1984) Panel data In Z Griliches amp M Intriligator (eds) Handbook of Economet-rics vol2 ch 22 pp1247ndash1318 Elsevier

Chesher A (2003) Identification in nonseparable models Econometrica 71 1405ndash1441Chesher A (2005) Nonparametric identification under discrete variation Econometrica 73

1525ndash1550Dawid AP (1979) Conditional independence in statistical theory Journal of the Royal Statistical

Society Series B 41 1ndash31Deb P amp PK Trivedi (2013) Finite mixture for panels with fixed effects Journal of Econometric

Methods 2 35ndash51Delgado M amp W Gonzalez-Manteiga (2001) Significance testing in nonparametric regression based

on the bootstrap Annals of Statistics 29 1469ndash1507Dumitrescu E-I amp C Hurlin (2012) Testing for Granger non-causality in heterogeneous panels

Economic Modelling 29 1450ndash1460Evdokimov K (2010) Identification and estimation of a nonparametric panel data model with unob-

served heterogeneity Working paper Department of Economics Princeton UniversityFlorens JP (2003) Some technical issues in defining causality Journal of Econometrics 112

127ndash128Florens JP amp D Fougere (1996) Non-causality in continuous time Econometrica 64 1195ndash1212Florens JP amp M Mouchart (1982) A note on non-causality Econometrica 50 583ndash591Granger CWJ (1969) Investigating causal relations by econometric models and cross-spectral

methods Econometrica 37 424ndash438Granger CWJ (1980) Testing for causality a personal viewpoint Journal of Economic Dynamics

and Control 2 329ndash352Granger CWJ (1988) Some recent developments in a concept of causality Journal of Econometrics

39 199ndash211Granger CWJ amp P Newbold (1986) Forecasting Economic Time Series Academic Press 2nd

editionHausman J amp T Woutersen (2014) Estimating a semi-parametric duration model without specifying

heterogeneity Journal of Econometrics 178 114ndash131Haavelmo T (1943) The statistical implications of a system of simultaneous equations Econometrica

11 1ndash12Haavelmo T (1944) The probability approach in econometrics Econometrica 12 (supp) iii-vi

1ndash115

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 287

Hahn J (1998) On the role of the propensity score in efficient semiparametric estimation of averagetreatment effect Econometrica 66 315ndash331

Hahn J amp R Moon (2010) Panel data models with finite number of multiple equilibria EconometricTheory 26 863ndash881

Heckman JJ (2000) Causal parameters and policy analysis in economics a twentieth century retro-spective Quarterly Journal of Economics 115 45ndash97

Heckman JJ (2008) Econometric causality International Statistical Review 76 1ndash27Heckman JJ amp R Pinto (2015) Causal analysis after Haavelmo Econometric Theory 31

115ndash151Hirano K G Imbens amp G Ridder (2003) Efficient estimation of average treatment effects using the

estimated propensity score Econometrica 71 1161ndash1189Hoderlein S amp E Mammen (2007) Identification of marginal effects in nonseparable models without

monotonicity Econometrica 75 1513ndash1519Hoderlein S amp E Mammen (2009) Identification and estimation of local average derivatives in non-

separable models without monotonicity Econometrics Journal 12 1ndash25Hoderlein S amp H White (2012) Nonparametric identification in nonseparable panel data models with

generalized fixed effects Journal of Econometrics 168 300ndash314Holland P (1986) Statistics and causal inference Journal of the American Statistical Association 81

945ndash970Holtz-Eakin D W Newey amp HS Rosen (1988) Estimating vector autoregressions with panel data

Econometrica 56 1371ndash1396Hoover K (2008) Causality in economics and econometrics In S Durlauf and L Blume (eds)

The New Palgrave Dictionary of Economics 2nd edition Palgrave MacmillanHsiao C (2014) Analysis of Panel Data Cambridge University PressHuang M Y Sun amp H White (2016) A flexible nonparametric test for conditional independence

Econometric Theory forthcomingHurwicz L (1950) Generalization of the concept of identification In TC Koopmans (eds) Statistical

Inference in Dynamic Economic Models pp 238ndash257 John WileyIm K H Pesaran amp Y Shin (2003) Testing for unit roots in heterogeneous panels Journal of Econo-

metrics 115 53ndash74Imbens G (2000) The role of the propensity score in estimating dose-response functions Biometrika

87 706ndash710Imbens G (2004) Nonparametric estimation of average treatment effects under exogeneity a review

Review of Economics and Statistics 86 4ndash29Imbens G amp W Newey (2009) Identification and estimation of triangular simultaneous equations

models without additivity Econometrica 77 1481ndash1512Imbens G amp J Wooldridge (2009) Recent developments in the econometrics of program evaluation

Journal of Economic Literature 47 5ndash86Kasy M (2011) Identification in triangular systems using control functions Econometric Theory 27

663ndash671Koopmans T (1950) Statistical Inference in Dynamic Economic Models Cowles Commission Mono-

graph No 10 WileyKuersteiner G (2008) GrangerndashSims causality In S Durlauf and L Blume (Eds) The New Palgrave

Dictionary of Economics 2nd edition Palgrave MacmillanLechner M (2001) Identification and estimation of causal effect of multiple treatments under con-

ditional independence assumption In M Lechner F Pfeiller (eds) Econometric Evaluations ofLabour Market Policies Hildelberg Physical 43ndash58

Lechner M (2011) The relation of different concepts of causality used in time series and microecono-metrics Econometric Reviews 30 109ndash127

Lin C-C amp S Ng (2012) Estimation of panel data models with parameter heterogeneity when groupmembership is unknown Journal of Econometric Methods 1 42ndash55

Lindsay B (1995) Mixture Models Theory Geometry and Applications Hayward Institute for Math-ematical Statistics

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

288 XUN LU ET AL

Linton O amp P Gozalo (2014) Testing conditional independence restrictions Econometric Reviews33 523ndash552

Lu X amp L Su (2014) Determining the number of groups in latent panel structures Working paperSchool of Economics Singapore Management University

Matzkin R (2003) Nonparametric estimation of nonadditive random functions Econometrica 711339ndash1375

Matzkin R (2007) Nonparametric identification In JJ Heckman and EE Leamer (eds) The Hand-book of Econometrics 6B North-Holland

Mclachlan G amp K Basford (1988) Mixture Models Inference and Applications to Clustering MarcelDekker

Nair-Reichert U amp D Weinhold (2001) Causality tests for cross-country panels a new look at FDIand economic growth in developing countries Oxford Bulletin of Economics and Statistics 63153-171

Pearl J (2009) Causality Models Reasoning and Inference (2nd edition) Cambridge UniversityPress

Pearl J (2015) Trygve Haavelmo and the emergence of causal calculus Econometric Theory 31152ndash179

Rosenbaum P amp D Rubin (1983) The central role of the propensity score in observational studies forcausal effects Biometrika 70 41ndash55

Rubin D (1974) Estimating causal effects of treatments in randomized and non-randomized studiesJournal of Educational Psychology 66 688ndash701

Rubin D (2004) Direct and indirect causal effects via potential outcomes Scandinavian Journal ofStatistics 31 161ndash170

Sarafidis V amp N Weber (2015) A partially heterogeneous framework for analyzing panel data OxfordBulletin of Economics and Statistics 77 274ndash296

Sims C (1972) Money income and causality American Economic Review 62 540ndash552Song K (2009) Testing conditional independence via Rosenblatt transforms Annals of Statistics 37

4011ndash4015Strotz R amp H Wold (1960) Recursive vs nonrecursive systems An attempt at synthesis Economet-

rica 28 417ndash427Su L Z Shi amp PCB Phillips (2014) Identifying latent structures in panel data Working paper

Department of Economics Yale UniversitySu L amp H White (2007) A Consistent characteristic function-based test for conditional indepen-

dence Journal of Econometrics 141 807ndash834Su L amp H White (2008) A nonparametric Hellinger metric test for conditional independence Econo-

metric Theory 24 829ndash864Su L amp H White (2014) Testing conditional independence via empirical likelihood Journal of

Econometrics 182 27ndash44Sun Y (2005) Estimation and inference in panel structure models Working paper Department of

Economics UCSDWhite H amp K Chalak (2009) Settable systems An extension of Pearlrsquos causal model with optimiza-

tion equilibrium and learning Journal of Machine Learning Research 10 1759ndash1799White H amp K Chalak (2010) Testing a conditional form of exogeneity Economics Letters 109

88ndash90White H amp K Chalak (2013) Identification and identification failure for treatment effects using

structural systems Econometric Reviews 32 273ndash317White H K Chalak amp X Lu (2011) Linking Granger causality and the Pearl causal model with

settable systems Journal of Machine Learning Research Workshop and Conference Proceedings12 1ndash29

White H amp P Kennedy (2009) Retrospective estimation of causal effects through time In J Castleamp N Shephard (eds) The Methodology and Practice of Econometrics A Festschrift in Honour ofDavid F Hendry pp 59ndash87 Oxford University Press

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 289

White H amp X Lu (2010) Granger causality and dynamic structural systems Journal of FinancialEconometrics 8 193ndash243

White H amp X Lu (2011) Causal diagrams for treatment effect estimation with application to selectionof efficient covariates Review of Economics and Statistics 93 1453ndash1459

Wooldridge J (2005) Simple solutions to the initial conditions problem for dynamic nonlinear paneldata models with unobserved heterogeneity Journal of Applied Econometrics 20 39ndash54

Zellner A (1979) Causality and econometrics In K Brunner and AH Meltzer (eds) Three Aspectsof Policy and Policymaking Carnegie-Rochester Conference Series Vol 10 North-Holland 9ndash54

APPENDIX

Proof of Theorem 31 To show D perp U | X we establish that f (u | d x) = f (u | x)for all (ud x) We have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

= f (u x)

f (x)= f (u | x)

where the first equality in the second line holds since A2(i) (ie Dg | Xg sim D | X)implies fg(d x) = f (d | x) fg(x) say and the second holds by A2(i i) fg(u | d x) =fg(u | x) n

Proof of Theorem 32 Note that

f (u | x) = f (u x)

f (x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

We also have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | d x) fg(x)sum

gisinG pg fg(x)

The first equality in the second line follows from A2(i) Then f (u | d x) = f (u | x) forall (ud x) if and only ifsumgisinG

pg fg(u | d x) fg(x) =sumgisinG

pg fg(u | x) fg(x) for all (ud x)

By assumption (Ug | Dg Xg) g isinG is regular It follows immediately that for all g isinGDg perp Ug | Xg n

Proof of Proposition 33 Take any g isin G and let j belong to group Jg Dg rArrSYg means that Yj = r(Zj Uj bj ) Thus Uj perp Dj | X j implies that Yj perp Dj | X j byDawid (1979 Lemmas 41 and 42) This holds for all g isin G Applying Theorem 31gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

290 XUN LU ET AL

Proof of Theorem 34 We prove this by showing that Y perp D | X implies that Dg rArrSYg for all g isinG Given the assumed regularity Theorem 32 ensures that Y perp D | X impliesthat Yg perp Dg | Xg for all g isin G

Given (i) eq(33) and A2(ii) imply that for all j in group JgE

(Yj |X j = x Dj = d

) = r1(d z bj

)+ E[r2

(zUj bj

) | Dj = d X j = x]

= r1(d z bj

)+ E[r2

(zUj bj

) | X j = x]

Clearly r1(d z bj

)is constant in d if and only if E

(Yj |Dj = d X j = x

)is constant in

d Yj perp Dj | X j implies that E(Yj |Dj = d X j = x

)is constant in d which then implies

that r1(d z bj

)is constant in d Thus Dj rArrS Yj so Dg rArrS Yg

Given (i i) let ky = 1 without loss of generality Then by A2(ii) for all j in group JgFUj (u | x) = E

[1Uj le u

∣∣ X j = x]

= E[

1r(d zUj bj

) le r(d zu bj

)∣∣ X j = x]

= E[

1r(Dj Zj Uj bj

) le r(d zu bj

)∣∣ Dj = d X j = x]

= E[

1Yj le r

(d zu bj

)∣∣ Dj = d X j = x]

= Fj(r(d zu bj

) | d x)

By strict monotonicity of Fj (y | d x) in y

r(d zu bj

) = Fminus1j

(FUj (u | x) | d x

)

Now Yj perp Dj | X j implies that Fminus1j

(FUj (u | x) | d x

)is constant in d so r

(d zu bj

)is

also a constant function in d ie Dj rArrS Yj so Dg rArrS Yg As either (i) or (i i) holds for each g isin G we have Dg rArrS Yg for all g isin G n

Proof of Theorem 35 To show that A1 - A2 and the assumed regularity for Ygtogether with Dg rArrS(Xg) Yg for some g isin G imply D perp Y | X we let j in group Jgg isin G be such that Dj rArrS(X j ) Yj A2(i i) ensures that Dj perp Uj | X j so

P[Yj le y | Dj = d X j = x] =int

1r(d zu bj ) le y

d Fj (u | d x)

=int

1r(d zu bj ) le y

d Fj (u | x)

By assumption Dj rArrS(X j ) Yj so there exists ylowast isin supp(Yj ) such that there is no mea-

surable mapping f jylowast for whichint

1r(d zu bj ) le ylowastd Fj (u | d x) = f jylowast(x) ae-x Specifically this rules out the possibility thatint

1r(d zu bj ) le ylowast

d Fj (u | X j ) = f jylowast(X j ) equiv P[Yj le ylowast | X j ] as

Thus there exists ylowast isin supp(Yj ) such that

P[Yj le ylowast | Dj = d X j = x

] = P[Yj le ylowast | X j = x

]

for x in a set of positive probability so Yj perp Dj | X j Given the regularity assumed for YgTheorem 32 (for Yg) gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 291

Proof of Theorem 41 The proof is analogous to that of Theorem 31 n

Proof of Proposition 42 The proof is analogous to that of Proposition 33 n

Proof of Proposition 43 Under B3 Dgt rArrS Ygt and Dgtminus1 rArrS Ygtminus1 forallg isinGt cupGtminus1 imply that for j isin Jg there exist two measurable functions r1 and r2 such that

Yjt = r1(Uj0 bj0)+ r2(Y jtminus1 Zjt Ujt bjt

) j = 1 N t = 1

Hence

Yjt minusYjtminus1 = r2(Y jtminus1 Zjt Ujt bjt

)minus r2(Y jtminus2 Zjtminus1Ujtminus1 bjtminus1

)

At the sample level this means that

Yt minusYtminus1 = r2(Y tminus1 Zt Ut Bt

)minus r2(Y tminus2 Ztminus1Utminus1 Btminus1

)

Then B4 implies that

Dt perp (Xt Y tminus1Ut Utminus1 Bt Btminus1) | Xt Y tminus1

which further implies that Dt perp (Yt minusYtminus1) | Xt Y tminus1by Dawid (1979 Lemmas 41 and42) as (Yt minus Ytminus1) is a function of (Xt Y tminus1Ut Utminus1 Bt Btminus1) Then we have Dt perpYt | Xt Y tminus1 as Ytminus1 is a component of Y tminus1 n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

Page 18: 33 GRANGER CAUSALITY AND STRUCTURAL CAUSALITY IN … Lu_Su... · 2017. 2. 7. · 264 XUNLUETAL. natural experiments. The goal of this paper is to establish the analogous equiv-alence

280 XUN LU ET AL

where εg = Ygminus E(Yg | Dg = d Xg = x

)is the regression residual for the given

group and Sg = (partpartd) ln fg(Ug | Dg Xg) is the exogeneity score of White andChalak (2013) Observe that E(εg Sg | Dg = d Xg = x) represents a specificform of omitted variable bias where the exogeneity score is the omitted variableA sufficient (but not necessary) condition for this bias to vanish is A2(i i) Dg perpUg | Xg for all g isin G

The third term is a pure heterogeneity bias

δ2(d x) =sumgisinG

pgpart

partdqg(d x) E

(Yg | Dg = d Xg = x

)

due to heterogeneity of fg(d | x) across members of G8 A sufficient (but notnecessary) condition for this to vanish is A2(i) Dg | Xg sim D | X for all g isin G

Thus A2 ensures that the regression derivative identifies the weighted effect

β(d x) = βlowast(d x) =sumgisinG

pg qlowastg(x)βlowast

g(d x)

where qlowastg(x) = fg(x)

sumh (ph fh(x)) and βlowast

g(d x) = int(partpartd)r(d zu bg)

d Fg(u | x) This identification provides the link between Gminuscausality in meanand structural causality

Although A2 is not necessary for effect identification cases where identifica-tion holds in the absence of A2 are quite special analogous to failures of reg-ularity Thus for practical purposes A2 can be viewed as playing the key rolein identifying effects of interest and thereby linking Gminuscausality and structuralcausality

Further as Proposition 61 and Corollary 62 of WL show in the absence ofstructural causality Gminuscausality is essentially equivalent to exogeneity failureHere this is reflected in the fact that when βg(d x) vanishes for all g A1 andmild regularity conditions give

part

partdE (Y | D = d X = x) = δ1(d x)+ δ2(d x)

Thus with structural noncausality as Gminuscausality in mean implies the fail-ure of A2(i) A2(i i) or both conversely failure of A2 essentially ensuresGminuscausality in mean

36 Testing for Gminuscausality and structural causality in cross-sectiondata

As just seen the hypothesis of Gminusnon-causality and thus of structural noncausal-ity under the conditional exogeneity assumption is a specific conditional inde-pendence In the literature there are many conditional independence tests thatapply to IID data (eg see Delgado and Gonzalez-Manteiga 2001 Fernandes and

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 281

Flores 2001 Su and White 2007 2008 2014 Song 2009 Linton and Gozalo2014 Huang Sun and White 2016) These methods can be computationallychallenging as they are nonparametric Angrist and Kuersteiner (2004 2011)develop a semi-parametric test for conditional independence Based on the factthat Y perp D | X implies ψ1(Y ) perp ψ2(D) | X for any measurable vector functions(ψ1ψ2) WL propose several convenient regression-based tests

4 STRUCTURAL CAUSALITY AND GminusCAUSALITY IN PANEL DATAThe literature contains considerable discussion about Gminuscausality in the paneldata setting Nevertheless the focus is mainly on linear or parametric modelsFor example Chamberlain (1984) discusses Gminuscausality ldquoconditional on un-observables [lsquofixed effectrsquo]rdquo in a linear model and a logit model Holtz-EakinNewey and Rosen (1988) consider a traditional linear vector autoregression(VAR) and use GMM to test Gminuscausality Nair-Reichert and Weinhold (2001)discuss Gminuscausality in a dynamic mixed fixed- and random-coefficients linearmodel allowing heterogeneity across individuals Dumitrescu and Hurlin (2012)test for Gminuscausality in a linear dynamic panel model with fixed coefficients thatvary across individuals There is also a growing literature on nonlinear and non-separable panel models (see eg Hoderlein and White (2012) and the referencestherein) We are not aware that Gminuscausality has been discussed in such modelsso far

We emphasize that in panel data Gminuscausality is also entirely based on pre-dictability and has no causal interpretation In this section we link Gminuscausalityand structural causality in general panel structures Since the static panel case issimilar to the cross-sectional case we focus on a dynamic data generating processTo simplify notation for given y isin N+ we let Y jtminus1 denote the finite historyY jtminus1 = (Yjtminusy Yjtminus1)

Assumption B1 (Panel Population DGP) Let (F P) be a complete prob-ability space let N isin N+ and let y isin N+ For the members of a populationj isin 1 N let the random vectors Yjt t = 12 be structurally determinedby a triangular system as

Yjt = r(Y jtminus1 Djt Zjt Ujt bjt

) (41)

where r is an unknown measurable ky times 1 function ky isin N+ Yjτ (τ = 1 minusy 0) Djt (kd times1kd isin N+) Zjt (kz times1 kz isin N) and Ujt (ku times1 ku isin N)are vectors of nondegenerate random variables on (F P) and bjt is a non-random real vector of dimension kb isin N Suppose also that Wjt t = 12 arerandom vectors on (F P) with dimension kw isin N The triangular structureis such that for t = 12 neither Yjsinfins=1minusy

nor Djsinfins=1 structurally deter-mine Wjt Yjsinfins=1minusy

does not structurally determine Djt Zjt or Ujt andDjsinfins=1 does not structurally determine Zjt or Ujt

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

282 XUN LU ET AL

The interpretation of structural eq(41) is the same as that for eq(31) ex-cept that here we have multiple time periods t = 12 The elements of(Yjt Djt Zjt Wjt Ujt

)and bjt rsquos can contain both time-varying and time-

invariant elements For now we do not distinguish these The elements of(Djt Zjt Wjt Ujt ) can contain lags of underlying variables By permittingonly finite histories Y jtminus1 Djt Zjt we restrict attention to Markov-type datagenerating processes this greatly simplifies the analysis and corresponds to thestructures usually considered in practice

As in the cross-section case we can have both structural heterogeneity anddistributional heterogeneity We specify the relevant structural and distributionalheterogeneities below

We again impose random sampling from the population For simplicity weassume all time periods can be observed for every individual so we observe abalanced panel Similar to the cross-section case we write

Yt = r (Y tminus1 Dt Zt Ut Bt ) t = 1 T

by suppressing the cross-sectional subscript i We assume that we observe data(Yt Dt Zt Wt ) t = 1 T and relevant lags prior to t = 1

41 Linking structural causality and Gminuscausality in dynamic panelsWe first define structural causality for the dynamic panel structure Let Djt be thesupport of Djt

DEFINITION 41 (Structural Causality in Dynamic Panels) Let j and t begiven If the function r(y jtminus1 middot zjt ujt bjt ) Djt rarr R

ky is constant on Djt

for all admissible y jtminus1 zjt and ujt then Djt does not structurally cause Yjt and we write Djt rArrS Yjt Otherwise Djt structurally causes Yjt and we writeDjt rArrS Yjt

We let X jt = (Z primejt W prime

jt )prime denote the covariates at time t WL consider

retrospective conditional exogeneity and retrospective Gminuscausality (White andKennedy 2009) We also allow this For this we letX jt denote a covariate his-tory that may contain X jt and lags or leads of X jt as in Wooldridge (2005 p 41)or WL ie for some m isin N+

X jt = SX(X jtminusm X jt X jt+m

) t = 12

where SX is a given selection matrix We assume

X jt X jtminusm subeX jt and

allow X jt also to contain leads of X jt When Xjt contains leads we have theretrospective case as in WL We denote randomly sampled covariates X t Forsimplicity we also assume that all needed elements of X t are observable Tem-poral precedence plays an important role in panel data In particular comparedwith the cross-sectional case we now have a richer set of covariates (includingboth leads and lags) to choose from

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 283

To provide a link between structural causality and Gminuscausality for dynamicpanels we condition on not only covariates X t but also lags of Yt For thiswe denote Y jtminus1 = (Yjtminusmy Yjtminus1) where my isin N+ and my ge y ie

Y jtminus1 sube Y jtminus1 Similarly we let Y tminus1 be the randomly sampled Y jtminus1Similar to the cross-section case for each t we define population group Jgt

g isin 1 t as a collection of population units having identical bjt and iden-tical distribution of (Ujt Djt ) |X jt Y jtminus1 We also let Gt = 1 t Forgiven t we write Dgt rArrS Ygt to denote Djt rArrS Yjt for all j in group Jgt g isin Gt

Next we impose an exogeneity assumption analogous to A2

Assumption B2 (Heterogeneous Dynamic Panel Exogeneity) Let t be givenFor all g isin Gt (i) Dgt |Xgt Y gtminus1 sim Dt |X t Y tminus1 and (i i) Dgt perp Ugt |Xgt Y gtminus1

Here as before for all g isin Gt the conditional distributions of Dgt given(Xgt Y gtminus1) are assumed to be identical and we write the common distributionas Dt |X t Y tminus1

The analog of Theorem 31 relating population and sample conditional exo-geneity is

THEOREM 41 Suppose B1ndashB2(i) hold and let t be given If B2(i i) holdsthen Dt perp Ut |X t Y tminus1

Proposition 42 provides a link between structural noncausality and Gminusnon-causality

PROPOSITION 42 Suppose that B1 and B2 hold Let t be given Thenstructural noncausality (Dgt rArrS Ygt forallg isin Gt ) implies that Dt does notGminuscause Yt wrtX t Y tminus1 ie Dt perp Yt |X t Y tminus1

We can also define structural causality with positive probability for dynamicstructures and show that structural causality and Gminuscausality are ldquoessentiallyrdquoequivalent under the key conditional exogeneity assumption For brevity we omitthe details

So far we have not carefully distinguished between time-varying and time-invariant elements of unobservable bjt and Ujt Below we briefly discuss thecase where we allow time-invariant components of Ujt and the randomly sampledbjt to be arbitrarily dependent or correlated with the cause of interest Djt For this we denote Ujt equiv (Ujt Uj0) where Ujt is time varying and Uj0time-invariant Similarly we denote bjt equiv (bjt bj0) The randomly sampledUjt Uj0 bjt bj0 are denoted as Ut U0 Bt B0 respectively In princi-ple we want to remove the time-invariant Uj0 and bj0 using eg the firstdifference For this it is convenient to impose a separable assumption on thestructural equation9 We also impose a conditional exogeneity assumption at thesample level

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

284 XUN LU ET AL

Assumption B3 Suppose that B1 holds with

Yjt = r1(Djt Uj0 bj0

)+r2(Y jtminus1 Djt Zjt Ujt bjt

) j = 1 N t = 1

where r1 and r2 are two unknown ky times1 measurable functions

Assumption B4 Suppose that for given t Dt perp (Ut Utminus1 Bt Btminus1

) |Xt Y tminus1

B4 is a sample-level assumption which can be supported by a correspond-ing assumption at the heterogeneous population level Note that in general B4is a strong assumption as we condition on Y tminus1 which is a function of bothDtminus1 and Utminus1 Nevertheless this assumption is plausible under the null of struc-tural noncausality as Y jtminus1 is a constant function of Djtminus1 under the nullOne simple example is that bjt rsquos are constants over j and t and Djt infint=1 perpYjt 0

t=1minusyUjt infint=1X jt infint=1 Under the null of structural noncausality for

all t Y jtminus1 is a function of Yjs0s=1minusy

Ujsts=1X jst

s=1 Then it is easyto show that in this case B4 is satisfied Certainly it will be interesting to relaxB4 but we leave this for future research

PROPOSITION 43 Suppose that B1ndashB4 hold Suppose that Dgt rArrS Ygtand Dgtminus1 rArrS Ygtminus1 forallg isin Gt cupGtminus1 Then Dt perp Yt | Xt Y tminus1

Proposition 43 suggests that when there are time-invariant components in theunobservables we can test for structural noncausality by testing for Gminusnon-causality Dt perp Yt | Xt Y tminus1 under the conditional exogeneity assumption B4

42 Testing for Gminuscausality and structural causality in panel dataAs shown above we can perform a Gminusnon causality test to test for structuralnoncausality Testing Gminusnon-causality in panel data is also simply a conditionalindependence test Here we focus on testing Dt perp Yt | Xt Y tminus1

First assume that the joint distributions of (Dt Yt Xt ) are identical over t In this case we can pool the data and implement a conditional independence testas discussed in Section 36

Second suppose that the joint distributions of (Dt Yt Xt ) are different overtime t In this case for each time period t using the cross-section data we canimplement a conditional independence test for Dt perp Yt | (Xt Y tminus1) and obtain atest statistic sayWt Thus we have T test statistics W1W2 WT and eachone can be used to test for structural noncausality for each t We may also want totest the hypothesis that for all t Djt does not structurally cause Yjt For this wecan construct our test statistic by taking the average of the T test statistics ieW equiv 1

T

sumTt=1Wt as in Im Pesaran and Shin (2003) and Dumitrescu and Hurlin

(2012) Other forms of ldquoaveragesrdquo are possible and special care is needed to takeinto account the dependence structure over time

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 285

5 CONCLUSION

This paper provides direct links between Granger causality and structural causal-ity in cross-section and panel data We extend Granger causality to cross-section and panel data and give a natural definition of structural causality inheterogeneous populations We show that under the key conditional exogeneityassumption Granger causality is essentially equivalent to structural causality incross-section and panel data Similar to the results in White and Lu (2010) fortime-series data our results here should enable researchers to avoid the misuse ofGranger causality and to establish the desired structural causal relation

NOTES

1 Strictly speaking if the conditional distributions are parameterized or treated as infinite-dimensional parameters then distributional heterogeneity may also be thought of as a special caseof structural heterogeneity

2 The classification is certainly overly simplistic We ignore the early literature on the philosophyof causality (for a review see eg Holland 1986 and Hoover 2008) We also ignore the litera-ture on causality discussed in other disciplines notably in machine learning (see eg Pearl 20092015 White and Chalak 2009 and White Chalak and Lu 2011) In particular Heckman and Pinto(2015) compare Haavelmorsquos structural framework of causality with the framework of Directed AcyclicGraphs (DAG) studied in the Bayesian network

3 Although Uj is typically called ldquounobservablerdquo it is better to view its elements as variables thatwill be omitted from empirical analysis This may be because they are unobservable it may also bebecause the researcher has purposefully or inadvertantly neglected them

4 For notational simplicity we suppress the observation subscript i5 Using our notation Holland considers the case where D perp (XU ) which implies our condi-

tional exogeneity assumption D perp U | X below6 See eg Altonji and Matzkin (2005) Hoderlein and Mammen (2007 2009) Imbens and Newey

(2009) White and Lu (2011) and White and Chalak (2013)7 AT T = E(Y (1)minusY (0) | D = 1) = E(U minus0 | D = 1) = E(U ) = 0

8 The conditional density fg(d | x) can also be interpreted as Imbensrsquos (2000) generalized propen-sity score

9 There is a recent literature which allows time-invariant unobservables to be arbitrarily correlatedwith causes of interest in the general nonseparable models (see eg Evdokimov 2010) We leave theproblem of linking G-causality and structural causality in this general setting for future research

REFERENCES

Altonji J amp R Matzkin (2005) Cross section and panel data estimators for nonseparable models withendogenous regressors Econometrica 73 1053ndash1102

Angrist J K Graddy amp G Imbens (2000) The interpretation of instrumental variables estimatorsin simultaneous equations models with an application to the demand for fish Review of EconomicStudies 67 499ndash527

Angrist J O Jorda amp G Kuersteiner (2013) Semiparametric estimates of monetary policy effectsstring theory revisited NBER Working Paper 19355

Angrist J amp G Kuersteiner (2004) Semiparametric causality tests using the policy propensity scoreNBER Working Paper No 10975

Angrist J amp G Kuersteiner (2011) Causal effects of monetary shocks semiparametric conditionalindependence tests with a multinomial propensity score Review of Economics and Statistics 93725ndash747

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

286 XUN LU ET AL

Angrist J amp J Pischke (2009) Mostly Harmless Econometrics An Empiricistrsquos Companion PrincetonUniversity Press

Bester CA amp CB Hansen (2016) Grouped effects estimators in fixed effects models Journal ofEconometrics 190 197ndash208

Bonhomme S amp E Manresa (2015) Grouped patterns of heterogeneity in panel data Econometrica83 1147ndash1184

Browning M amp JM Carro (2007) Heterogeneity and microeconometrics modelling In R BlundellW K Newey and T Persson (Eds) Advances in Economics and Econometrics Theory andApplications Ninth World Congress of the Econometric Society Volume 3 pp 45ndash74 CambridgeUniversity Press

Browning M amp JM Carro (2010) Heterogeneity in dynamic discrete choice models EconometricsJournal 13 1ndash39

Burda M M Harding amp J Hausman (2015) A Bayesian semi-parametric competing risk model withunobserved heterogeneity Journal of Applied Econometrics 30 353ndash376

Chalak K amp H White (2011) An extended class of instrumental variables for the estimation of causaleffects Canadian Journal of Economics 44 1ndash51

Chalak K amp H White (2012) Causality conditional independence and graphical separation insettable systems Neural Computation 24 1611ndash1668

Chamberlain G (1982) The general equivalence of Granger and Sims causality Econometrica 50569ndash581

Chamberlain G (1984) Panel data In Z Griliches amp M Intriligator (eds) Handbook of Economet-rics vol2 ch 22 pp1247ndash1318 Elsevier

Chesher A (2003) Identification in nonseparable models Econometrica 71 1405ndash1441Chesher A (2005) Nonparametric identification under discrete variation Econometrica 73

1525ndash1550Dawid AP (1979) Conditional independence in statistical theory Journal of the Royal Statistical

Society Series B 41 1ndash31Deb P amp PK Trivedi (2013) Finite mixture for panels with fixed effects Journal of Econometric

Methods 2 35ndash51Delgado M amp W Gonzalez-Manteiga (2001) Significance testing in nonparametric regression based

on the bootstrap Annals of Statistics 29 1469ndash1507Dumitrescu E-I amp C Hurlin (2012) Testing for Granger non-causality in heterogeneous panels

Economic Modelling 29 1450ndash1460Evdokimov K (2010) Identification and estimation of a nonparametric panel data model with unob-

served heterogeneity Working paper Department of Economics Princeton UniversityFlorens JP (2003) Some technical issues in defining causality Journal of Econometrics 112

127ndash128Florens JP amp D Fougere (1996) Non-causality in continuous time Econometrica 64 1195ndash1212Florens JP amp M Mouchart (1982) A note on non-causality Econometrica 50 583ndash591Granger CWJ (1969) Investigating causal relations by econometric models and cross-spectral

methods Econometrica 37 424ndash438Granger CWJ (1980) Testing for causality a personal viewpoint Journal of Economic Dynamics

and Control 2 329ndash352Granger CWJ (1988) Some recent developments in a concept of causality Journal of Econometrics

39 199ndash211Granger CWJ amp P Newbold (1986) Forecasting Economic Time Series Academic Press 2nd

editionHausman J amp T Woutersen (2014) Estimating a semi-parametric duration model without specifying

heterogeneity Journal of Econometrics 178 114ndash131Haavelmo T (1943) The statistical implications of a system of simultaneous equations Econometrica

11 1ndash12Haavelmo T (1944) The probability approach in econometrics Econometrica 12 (supp) iii-vi

1ndash115

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 287

Hahn J (1998) On the role of the propensity score in efficient semiparametric estimation of averagetreatment effect Econometrica 66 315ndash331

Hahn J amp R Moon (2010) Panel data models with finite number of multiple equilibria EconometricTheory 26 863ndash881

Heckman JJ (2000) Causal parameters and policy analysis in economics a twentieth century retro-spective Quarterly Journal of Economics 115 45ndash97

Heckman JJ (2008) Econometric causality International Statistical Review 76 1ndash27Heckman JJ amp R Pinto (2015) Causal analysis after Haavelmo Econometric Theory 31

115ndash151Hirano K G Imbens amp G Ridder (2003) Efficient estimation of average treatment effects using the

estimated propensity score Econometrica 71 1161ndash1189Hoderlein S amp E Mammen (2007) Identification of marginal effects in nonseparable models without

monotonicity Econometrica 75 1513ndash1519Hoderlein S amp E Mammen (2009) Identification and estimation of local average derivatives in non-

separable models without monotonicity Econometrics Journal 12 1ndash25Hoderlein S amp H White (2012) Nonparametric identification in nonseparable panel data models with

generalized fixed effects Journal of Econometrics 168 300ndash314Holland P (1986) Statistics and causal inference Journal of the American Statistical Association 81

945ndash970Holtz-Eakin D W Newey amp HS Rosen (1988) Estimating vector autoregressions with panel data

Econometrica 56 1371ndash1396Hoover K (2008) Causality in economics and econometrics In S Durlauf and L Blume (eds)

The New Palgrave Dictionary of Economics 2nd edition Palgrave MacmillanHsiao C (2014) Analysis of Panel Data Cambridge University PressHuang M Y Sun amp H White (2016) A flexible nonparametric test for conditional independence

Econometric Theory forthcomingHurwicz L (1950) Generalization of the concept of identification In TC Koopmans (eds) Statistical

Inference in Dynamic Economic Models pp 238ndash257 John WileyIm K H Pesaran amp Y Shin (2003) Testing for unit roots in heterogeneous panels Journal of Econo-

metrics 115 53ndash74Imbens G (2000) The role of the propensity score in estimating dose-response functions Biometrika

87 706ndash710Imbens G (2004) Nonparametric estimation of average treatment effects under exogeneity a review

Review of Economics and Statistics 86 4ndash29Imbens G amp W Newey (2009) Identification and estimation of triangular simultaneous equations

models without additivity Econometrica 77 1481ndash1512Imbens G amp J Wooldridge (2009) Recent developments in the econometrics of program evaluation

Journal of Economic Literature 47 5ndash86Kasy M (2011) Identification in triangular systems using control functions Econometric Theory 27

663ndash671Koopmans T (1950) Statistical Inference in Dynamic Economic Models Cowles Commission Mono-

graph No 10 WileyKuersteiner G (2008) GrangerndashSims causality In S Durlauf and L Blume (Eds) The New Palgrave

Dictionary of Economics 2nd edition Palgrave MacmillanLechner M (2001) Identification and estimation of causal effect of multiple treatments under con-

ditional independence assumption In M Lechner F Pfeiller (eds) Econometric Evaluations ofLabour Market Policies Hildelberg Physical 43ndash58

Lechner M (2011) The relation of different concepts of causality used in time series and microecono-metrics Econometric Reviews 30 109ndash127

Lin C-C amp S Ng (2012) Estimation of panel data models with parameter heterogeneity when groupmembership is unknown Journal of Econometric Methods 1 42ndash55

Lindsay B (1995) Mixture Models Theory Geometry and Applications Hayward Institute for Math-ematical Statistics

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

288 XUN LU ET AL

Linton O amp P Gozalo (2014) Testing conditional independence restrictions Econometric Reviews33 523ndash552

Lu X amp L Su (2014) Determining the number of groups in latent panel structures Working paperSchool of Economics Singapore Management University

Matzkin R (2003) Nonparametric estimation of nonadditive random functions Econometrica 711339ndash1375

Matzkin R (2007) Nonparametric identification In JJ Heckman and EE Leamer (eds) The Hand-book of Econometrics 6B North-Holland

Mclachlan G amp K Basford (1988) Mixture Models Inference and Applications to Clustering MarcelDekker

Nair-Reichert U amp D Weinhold (2001) Causality tests for cross-country panels a new look at FDIand economic growth in developing countries Oxford Bulletin of Economics and Statistics 63153-171

Pearl J (2009) Causality Models Reasoning and Inference (2nd edition) Cambridge UniversityPress

Pearl J (2015) Trygve Haavelmo and the emergence of causal calculus Econometric Theory 31152ndash179

Rosenbaum P amp D Rubin (1983) The central role of the propensity score in observational studies forcausal effects Biometrika 70 41ndash55

Rubin D (1974) Estimating causal effects of treatments in randomized and non-randomized studiesJournal of Educational Psychology 66 688ndash701

Rubin D (2004) Direct and indirect causal effects via potential outcomes Scandinavian Journal ofStatistics 31 161ndash170

Sarafidis V amp N Weber (2015) A partially heterogeneous framework for analyzing panel data OxfordBulletin of Economics and Statistics 77 274ndash296

Sims C (1972) Money income and causality American Economic Review 62 540ndash552Song K (2009) Testing conditional independence via Rosenblatt transforms Annals of Statistics 37

4011ndash4015Strotz R amp H Wold (1960) Recursive vs nonrecursive systems An attempt at synthesis Economet-

rica 28 417ndash427Su L Z Shi amp PCB Phillips (2014) Identifying latent structures in panel data Working paper

Department of Economics Yale UniversitySu L amp H White (2007) A Consistent characteristic function-based test for conditional indepen-

dence Journal of Econometrics 141 807ndash834Su L amp H White (2008) A nonparametric Hellinger metric test for conditional independence Econo-

metric Theory 24 829ndash864Su L amp H White (2014) Testing conditional independence via empirical likelihood Journal of

Econometrics 182 27ndash44Sun Y (2005) Estimation and inference in panel structure models Working paper Department of

Economics UCSDWhite H amp K Chalak (2009) Settable systems An extension of Pearlrsquos causal model with optimiza-

tion equilibrium and learning Journal of Machine Learning Research 10 1759ndash1799White H amp K Chalak (2010) Testing a conditional form of exogeneity Economics Letters 109

88ndash90White H amp K Chalak (2013) Identification and identification failure for treatment effects using

structural systems Econometric Reviews 32 273ndash317White H K Chalak amp X Lu (2011) Linking Granger causality and the Pearl causal model with

settable systems Journal of Machine Learning Research Workshop and Conference Proceedings12 1ndash29

White H amp P Kennedy (2009) Retrospective estimation of causal effects through time In J Castleamp N Shephard (eds) The Methodology and Practice of Econometrics A Festschrift in Honour ofDavid F Hendry pp 59ndash87 Oxford University Press

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 289

White H amp X Lu (2010) Granger causality and dynamic structural systems Journal of FinancialEconometrics 8 193ndash243

White H amp X Lu (2011) Causal diagrams for treatment effect estimation with application to selectionof efficient covariates Review of Economics and Statistics 93 1453ndash1459

Wooldridge J (2005) Simple solutions to the initial conditions problem for dynamic nonlinear paneldata models with unobserved heterogeneity Journal of Applied Econometrics 20 39ndash54

Zellner A (1979) Causality and econometrics In K Brunner and AH Meltzer (eds) Three Aspectsof Policy and Policymaking Carnegie-Rochester Conference Series Vol 10 North-Holland 9ndash54

APPENDIX

Proof of Theorem 31 To show D perp U | X we establish that f (u | d x) = f (u | x)for all (ud x) We have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

= f (u x)

f (x)= f (u | x)

where the first equality in the second line holds since A2(i) (ie Dg | Xg sim D | X)implies fg(d x) = f (d | x) fg(x) say and the second holds by A2(i i) fg(u | d x) =fg(u | x) n

Proof of Theorem 32 Note that

f (u | x) = f (u x)

f (x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

We also have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | d x) fg(x)sum

gisinG pg fg(x)

The first equality in the second line follows from A2(i) Then f (u | d x) = f (u | x) forall (ud x) if and only ifsumgisinG

pg fg(u | d x) fg(x) =sumgisinG

pg fg(u | x) fg(x) for all (ud x)

By assumption (Ug | Dg Xg) g isinG is regular It follows immediately that for all g isinGDg perp Ug | Xg n

Proof of Proposition 33 Take any g isin G and let j belong to group Jg Dg rArrSYg means that Yj = r(Zj Uj bj ) Thus Uj perp Dj | X j implies that Yj perp Dj | X j byDawid (1979 Lemmas 41 and 42) This holds for all g isin G Applying Theorem 31gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

290 XUN LU ET AL

Proof of Theorem 34 We prove this by showing that Y perp D | X implies that Dg rArrSYg for all g isinG Given the assumed regularity Theorem 32 ensures that Y perp D | X impliesthat Yg perp Dg | Xg for all g isin G

Given (i) eq(33) and A2(ii) imply that for all j in group JgE

(Yj |X j = x Dj = d

) = r1(d z bj

)+ E[r2

(zUj bj

) | Dj = d X j = x]

= r1(d z bj

)+ E[r2

(zUj bj

) | X j = x]

Clearly r1(d z bj

)is constant in d if and only if E

(Yj |Dj = d X j = x

)is constant in

d Yj perp Dj | X j implies that E(Yj |Dj = d X j = x

)is constant in d which then implies

that r1(d z bj

)is constant in d Thus Dj rArrS Yj so Dg rArrS Yg

Given (i i) let ky = 1 without loss of generality Then by A2(ii) for all j in group JgFUj (u | x) = E

[1Uj le u

∣∣ X j = x]

= E[

1r(d zUj bj

) le r(d zu bj

)∣∣ X j = x]

= E[

1r(Dj Zj Uj bj

) le r(d zu bj

)∣∣ Dj = d X j = x]

= E[

1Yj le r

(d zu bj

)∣∣ Dj = d X j = x]

= Fj(r(d zu bj

) | d x)

By strict monotonicity of Fj (y | d x) in y

r(d zu bj

) = Fminus1j

(FUj (u | x) | d x

)

Now Yj perp Dj | X j implies that Fminus1j

(FUj (u | x) | d x

)is constant in d so r

(d zu bj

)is

also a constant function in d ie Dj rArrS Yj so Dg rArrS Yg As either (i) or (i i) holds for each g isin G we have Dg rArrS Yg for all g isin G n

Proof of Theorem 35 To show that A1 - A2 and the assumed regularity for Ygtogether with Dg rArrS(Xg) Yg for some g isin G imply D perp Y | X we let j in group Jgg isin G be such that Dj rArrS(X j ) Yj A2(i i) ensures that Dj perp Uj | X j so

P[Yj le y | Dj = d X j = x] =int

1r(d zu bj ) le y

d Fj (u | d x)

=int

1r(d zu bj ) le y

d Fj (u | x)

By assumption Dj rArrS(X j ) Yj so there exists ylowast isin supp(Yj ) such that there is no mea-

surable mapping f jylowast for whichint

1r(d zu bj ) le ylowastd Fj (u | d x) = f jylowast(x) ae-x Specifically this rules out the possibility thatint

1r(d zu bj ) le ylowast

d Fj (u | X j ) = f jylowast(X j ) equiv P[Yj le ylowast | X j ] as

Thus there exists ylowast isin supp(Yj ) such that

P[Yj le ylowast | Dj = d X j = x

] = P[Yj le ylowast | X j = x

]

for x in a set of positive probability so Yj perp Dj | X j Given the regularity assumed for YgTheorem 32 (for Yg) gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 291

Proof of Theorem 41 The proof is analogous to that of Theorem 31 n

Proof of Proposition 42 The proof is analogous to that of Proposition 33 n

Proof of Proposition 43 Under B3 Dgt rArrS Ygt and Dgtminus1 rArrS Ygtminus1 forallg isinGt cupGtminus1 imply that for j isin Jg there exist two measurable functions r1 and r2 such that

Yjt = r1(Uj0 bj0)+ r2(Y jtminus1 Zjt Ujt bjt

) j = 1 N t = 1

Hence

Yjt minusYjtminus1 = r2(Y jtminus1 Zjt Ujt bjt

)minus r2(Y jtminus2 Zjtminus1Ujtminus1 bjtminus1

)

At the sample level this means that

Yt minusYtminus1 = r2(Y tminus1 Zt Ut Bt

)minus r2(Y tminus2 Ztminus1Utminus1 Btminus1

)

Then B4 implies that

Dt perp (Xt Y tminus1Ut Utminus1 Bt Btminus1) | Xt Y tminus1

which further implies that Dt perp (Yt minusYtminus1) | Xt Y tminus1by Dawid (1979 Lemmas 41 and42) as (Yt minus Ytminus1) is a function of (Xt Y tminus1Ut Utminus1 Bt Btminus1) Then we have Dt perpYt | Xt Y tminus1 as Ytminus1 is a component of Y tminus1 n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

Page 19: 33 GRANGER CAUSALITY AND STRUCTURAL CAUSALITY IN … Lu_Su... · 2017. 2. 7. · 264 XUNLUETAL. natural experiments. The goal of this paper is to establish the analogous equiv-alence

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 281

Flores 2001 Su and White 2007 2008 2014 Song 2009 Linton and Gozalo2014 Huang Sun and White 2016) These methods can be computationallychallenging as they are nonparametric Angrist and Kuersteiner (2004 2011)develop a semi-parametric test for conditional independence Based on the factthat Y perp D | X implies ψ1(Y ) perp ψ2(D) | X for any measurable vector functions(ψ1ψ2) WL propose several convenient regression-based tests

4 STRUCTURAL CAUSALITY AND GminusCAUSALITY IN PANEL DATAThe literature contains considerable discussion about Gminuscausality in the paneldata setting Nevertheless the focus is mainly on linear or parametric modelsFor example Chamberlain (1984) discusses Gminuscausality ldquoconditional on un-observables [lsquofixed effectrsquo]rdquo in a linear model and a logit model Holtz-EakinNewey and Rosen (1988) consider a traditional linear vector autoregression(VAR) and use GMM to test Gminuscausality Nair-Reichert and Weinhold (2001)discuss Gminuscausality in a dynamic mixed fixed- and random-coefficients linearmodel allowing heterogeneity across individuals Dumitrescu and Hurlin (2012)test for Gminuscausality in a linear dynamic panel model with fixed coefficients thatvary across individuals There is also a growing literature on nonlinear and non-separable panel models (see eg Hoderlein and White (2012) and the referencestherein) We are not aware that Gminuscausality has been discussed in such modelsso far

We emphasize that in panel data Gminuscausality is also entirely based on pre-dictability and has no causal interpretation In this section we link Gminuscausalityand structural causality in general panel structures Since the static panel case issimilar to the cross-sectional case we focus on a dynamic data generating processTo simplify notation for given y isin N+ we let Y jtminus1 denote the finite historyY jtminus1 = (Yjtminusy Yjtminus1)

Assumption B1 (Panel Population DGP) Let (F P) be a complete prob-ability space let N isin N+ and let y isin N+ For the members of a populationj isin 1 N let the random vectors Yjt t = 12 be structurally determinedby a triangular system as

Yjt = r(Y jtminus1 Djt Zjt Ujt bjt

) (41)

where r is an unknown measurable ky times 1 function ky isin N+ Yjτ (τ = 1 minusy 0) Djt (kd times1kd isin N+) Zjt (kz times1 kz isin N) and Ujt (ku times1 ku isin N)are vectors of nondegenerate random variables on (F P) and bjt is a non-random real vector of dimension kb isin N Suppose also that Wjt t = 12 arerandom vectors on (F P) with dimension kw isin N The triangular structureis such that for t = 12 neither Yjsinfins=1minusy

nor Djsinfins=1 structurally deter-mine Wjt Yjsinfins=1minusy

does not structurally determine Djt Zjt or Ujt andDjsinfins=1 does not structurally determine Zjt or Ujt

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

282 XUN LU ET AL

The interpretation of structural eq(41) is the same as that for eq(31) ex-cept that here we have multiple time periods t = 12 The elements of(Yjt Djt Zjt Wjt Ujt

)and bjt rsquos can contain both time-varying and time-

invariant elements For now we do not distinguish these The elements of(Djt Zjt Wjt Ujt ) can contain lags of underlying variables By permittingonly finite histories Y jtminus1 Djt Zjt we restrict attention to Markov-type datagenerating processes this greatly simplifies the analysis and corresponds to thestructures usually considered in practice

As in the cross-section case we can have both structural heterogeneity anddistributional heterogeneity We specify the relevant structural and distributionalheterogeneities below

We again impose random sampling from the population For simplicity weassume all time periods can be observed for every individual so we observe abalanced panel Similar to the cross-section case we write

Yt = r (Y tminus1 Dt Zt Ut Bt ) t = 1 T

by suppressing the cross-sectional subscript i We assume that we observe data(Yt Dt Zt Wt ) t = 1 T and relevant lags prior to t = 1

41 Linking structural causality and Gminuscausality in dynamic panelsWe first define structural causality for the dynamic panel structure Let Djt be thesupport of Djt

DEFINITION 41 (Structural Causality in Dynamic Panels) Let j and t begiven If the function r(y jtminus1 middot zjt ujt bjt ) Djt rarr R

ky is constant on Djt

for all admissible y jtminus1 zjt and ujt then Djt does not structurally cause Yjt and we write Djt rArrS Yjt Otherwise Djt structurally causes Yjt and we writeDjt rArrS Yjt

We let X jt = (Z primejt W prime

jt )prime denote the covariates at time t WL consider

retrospective conditional exogeneity and retrospective Gminuscausality (White andKennedy 2009) We also allow this For this we letX jt denote a covariate his-tory that may contain X jt and lags or leads of X jt as in Wooldridge (2005 p 41)or WL ie for some m isin N+

X jt = SX(X jtminusm X jt X jt+m

) t = 12

where SX is a given selection matrix We assume

X jt X jtminusm subeX jt and

allow X jt also to contain leads of X jt When Xjt contains leads we have theretrospective case as in WL We denote randomly sampled covariates X t Forsimplicity we also assume that all needed elements of X t are observable Tem-poral precedence plays an important role in panel data In particular comparedwith the cross-sectional case we now have a richer set of covariates (includingboth leads and lags) to choose from

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 283

To provide a link between structural causality and Gminuscausality for dynamicpanels we condition on not only covariates X t but also lags of Yt For thiswe denote Y jtminus1 = (Yjtminusmy Yjtminus1) where my isin N+ and my ge y ie

Y jtminus1 sube Y jtminus1 Similarly we let Y tminus1 be the randomly sampled Y jtminus1Similar to the cross-section case for each t we define population group Jgt

g isin 1 t as a collection of population units having identical bjt and iden-tical distribution of (Ujt Djt ) |X jt Y jtminus1 We also let Gt = 1 t Forgiven t we write Dgt rArrS Ygt to denote Djt rArrS Yjt for all j in group Jgt g isin Gt

Next we impose an exogeneity assumption analogous to A2

Assumption B2 (Heterogeneous Dynamic Panel Exogeneity) Let t be givenFor all g isin Gt (i) Dgt |Xgt Y gtminus1 sim Dt |X t Y tminus1 and (i i) Dgt perp Ugt |Xgt Y gtminus1

Here as before for all g isin Gt the conditional distributions of Dgt given(Xgt Y gtminus1) are assumed to be identical and we write the common distributionas Dt |X t Y tminus1

The analog of Theorem 31 relating population and sample conditional exo-geneity is

THEOREM 41 Suppose B1ndashB2(i) hold and let t be given If B2(i i) holdsthen Dt perp Ut |X t Y tminus1

Proposition 42 provides a link between structural noncausality and Gminusnon-causality

PROPOSITION 42 Suppose that B1 and B2 hold Let t be given Thenstructural noncausality (Dgt rArrS Ygt forallg isin Gt ) implies that Dt does notGminuscause Yt wrtX t Y tminus1 ie Dt perp Yt |X t Y tminus1

We can also define structural causality with positive probability for dynamicstructures and show that structural causality and Gminuscausality are ldquoessentiallyrdquoequivalent under the key conditional exogeneity assumption For brevity we omitthe details

So far we have not carefully distinguished between time-varying and time-invariant elements of unobservable bjt and Ujt Below we briefly discuss thecase where we allow time-invariant components of Ujt and the randomly sampledbjt to be arbitrarily dependent or correlated with the cause of interest Djt For this we denote Ujt equiv (Ujt Uj0) where Ujt is time varying and Uj0time-invariant Similarly we denote bjt equiv (bjt bj0) The randomly sampledUjt Uj0 bjt bj0 are denoted as Ut U0 Bt B0 respectively In princi-ple we want to remove the time-invariant Uj0 and bj0 using eg the firstdifference For this it is convenient to impose a separable assumption on thestructural equation9 We also impose a conditional exogeneity assumption at thesample level

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

284 XUN LU ET AL

Assumption B3 Suppose that B1 holds with

Yjt = r1(Djt Uj0 bj0

)+r2(Y jtminus1 Djt Zjt Ujt bjt

) j = 1 N t = 1

where r1 and r2 are two unknown ky times1 measurable functions

Assumption B4 Suppose that for given t Dt perp (Ut Utminus1 Bt Btminus1

) |Xt Y tminus1

B4 is a sample-level assumption which can be supported by a correspond-ing assumption at the heterogeneous population level Note that in general B4is a strong assumption as we condition on Y tminus1 which is a function of bothDtminus1 and Utminus1 Nevertheless this assumption is plausible under the null of struc-tural noncausality as Y jtminus1 is a constant function of Djtminus1 under the nullOne simple example is that bjt rsquos are constants over j and t and Djt infint=1 perpYjt 0

t=1minusyUjt infint=1X jt infint=1 Under the null of structural noncausality for

all t Y jtminus1 is a function of Yjs0s=1minusy

Ujsts=1X jst

s=1 Then it is easyto show that in this case B4 is satisfied Certainly it will be interesting to relaxB4 but we leave this for future research

PROPOSITION 43 Suppose that B1ndashB4 hold Suppose that Dgt rArrS Ygtand Dgtminus1 rArrS Ygtminus1 forallg isin Gt cupGtminus1 Then Dt perp Yt | Xt Y tminus1

Proposition 43 suggests that when there are time-invariant components in theunobservables we can test for structural noncausality by testing for Gminusnon-causality Dt perp Yt | Xt Y tminus1 under the conditional exogeneity assumption B4

42 Testing for Gminuscausality and structural causality in panel dataAs shown above we can perform a Gminusnon causality test to test for structuralnoncausality Testing Gminusnon-causality in panel data is also simply a conditionalindependence test Here we focus on testing Dt perp Yt | Xt Y tminus1

First assume that the joint distributions of (Dt Yt Xt ) are identical over t In this case we can pool the data and implement a conditional independence testas discussed in Section 36

Second suppose that the joint distributions of (Dt Yt Xt ) are different overtime t In this case for each time period t using the cross-section data we canimplement a conditional independence test for Dt perp Yt | (Xt Y tminus1) and obtain atest statistic sayWt Thus we have T test statistics W1W2 WT and eachone can be used to test for structural noncausality for each t We may also want totest the hypothesis that for all t Djt does not structurally cause Yjt For this wecan construct our test statistic by taking the average of the T test statistics ieW equiv 1

T

sumTt=1Wt as in Im Pesaran and Shin (2003) and Dumitrescu and Hurlin

(2012) Other forms of ldquoaveragesrdquo are possible and special care is needed to takeinto account the dependence structure over time

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 285

5 CONCLUSION

This paper provides direct links between Granger causality and structural causal-ity in cross-section and panel data We extend Granger causality to cross-section and panel data and give a natural definition of structural causality inheterogeneous populations We show that under the key conditional exogeneityassumption Granger causality is essentially equivalent to structural causality incross-section and panel data Similar to the results in White and Lu (2010) fortime-series data our results here should enable researchers to avoid the misuse ofGranger causality and to establish the desired structural causal relation

NOTES

1 Strictly speaking if the conditional distributions are parameterized or treated as infinite-dimensional parameters then distributional heterogeneity may also be thought of as a special caseof structural heterogeneity

2 The classification is certainly overly simplistic We ignore the early literature on the philosophyof causality (for a review see eg Holland 1986 and Hoover 2008) We also ignore the litera-ture on causality discussed in other disciplines notably in machine learning (see eg Pearl 20092015 White and Chalak 2009 and White Chalak and Lu 2011) In particular Heckman and Pinto(2015) compare Haavelmorsquos structural framework of causality with the framework of Directed AcyclicGraphs (DAG) studied in the Bayesian network

3 Although Uj is typically called ldquounobservablerdquo it is better to view its elements as variables thatwill be omitted from empirical analysis This may be because they are unobservable it may also bebecause the researcher has purposefully or inadvertantly neglected them

4 For notational simplicity we suppress the observation subscript i5 Using our notation Holland considers the case where D perp (XU ) which implies our condi-

tional exogeneity assumption D perp U | X below6 See eg Altonji and Matzkin (2005) Hoderlein and Mammen (2007 2009) Imbens and Newey

(2009) White and Lu (2011) and White and Chalak (2013)7 AT T = E(Y (1)minusY (0) | D = 1) = E(U minus0 | D = 1) = E(U ) = 0

8 The conditional density fg(d | x) can also be interpreted as Imbensrsquos (2000) generalized propen-sity score

9 There is a recent literature which allows time-invariant unobservables to be arbitrarily correlatedwith causes of interest in the general nonseparable models (see eg Evdokimov 2010) We leave theproblem of linking G-causality and structural causality in this general setting for future research

REFERENCES

Altonji J amp R Matzkin (2005) Cross section and panel data estimators for nonseparable models withendogenous regressors Econometrica 73 1053ndash1102

Angrist J K Graddy amp G Imbens (2000) The interpretation of instrumental variables estimatorsin simultaneous equations models with an application to the demand for fish Review of EconomicStudies 67 499ndash527

Angrist J O Jorda amp G Kuersteiner (2013) Semiparametric estimates of monetary policy effectsstring theory revisited NBER Working Paper 19355

Angrist J amp G Kuersteiner (2004) Semiparametric causality tests using the policy propensity scoreNBER Working Paper No 10975

Angrist J amp G Kuersteiner (2011) Causal effects of monetary shocks semiparametric conditionalindependence tests with a multinomial propensity score Review of Economics and Statistics 93725ndash747

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

286 XUN LU ET AL

Angrist J amp J Pischke (2009) Mostly Harmless Econometrics An Empiricistrsquos Companion PrincetonUniversity Press

Bester CA amp CB Hansen (2016) Grouped effects estimators in fixed effects models Journal ofEconometrics 190 197ndash208

Bonhomme S amp E Manresa (2015) Grouped patterns of heterogeneity in panel data Econometrica83 1147ndash1184

Browning M amp JM Carro (2007) Heterogeneity and microeconometrics modelling In R BlundellW K Newey and T Persson (Eds) Advances in Economics and Econometrics Theory andApplications Ninth World Congress of the Econometric Society Volume 3 pp 45ndash74 CambridgeUniversity Press

Browning M amp JM Carro (2010) Heterogeneity in dynamic discrete choice models EconometricsJournal 13 1ndash39

Burda M M Harding amp J Hausman (2015) A Bayesian semi-parametric competing risk model withunobserved heterogeneity Journal of Applied Econometrics 30 353ndash376

Chalak K amp H White (2011) An extended class of instrumental variables for the estimation of causaleffects Canadian Journal of Economics 44 1ndash51

Chalak K amp H White (2012) Causality conditional independence and graphical separation insettable systems Neural Computation 24 1611ndash1668

Chamberlain G (1982) The general equivalence of Granger and Sims causality Econometrica 50569ndash581

Chamberlain G (1984) Panel data In Z Griliches amp M Intriligator (eds) Handbook of Economet-rics vol2 ch 22 pp1247ndash1318 Elsevier

Chesher A (2003) Identification in nonseparable models Econometrica 71 1405ndash1441Chesher A (2005) Nonparametric identification under discrete variation Econometrica 73

1525ndash1550Dawid AP (1979) Conditional independence in statistical theory Journal of the Royal Statistical

Society Series B 41 1ndash31Deb P amp PK Trivedi (2013) Finite mixture for panels with fixed effects Journal of Econometric

Methods 2 35ndash51Delgado M amp W Gonzalez-Manteiga (2001) Significance testing in nonparametric regression based

on the bootstrap Annals of Statistics 29 1469ndash1507Dumitrescu E-I amp C Hurlin (2012) Testing for Granger non-causality in heterogeneous panels

Economic Modelling 29 1450ndash1460Evdokimov K (2010) Identification and estimation of a nonparametric panel data model with unob-

served heterogeneity Working paper Department of Economics Princeton UniversityFlorens JP (2003) Some technical issues in defining causality Journal of Econometrics 112

127ndash128Florens JP amp D Fougere (1996) Non-causality in continuous time Econometrica 64 1195ndash1212Florens JP amp M Mouchart (1982) A note on non-causality Econometrica 50 583ndash591Granger CWJ (1969) Investigating causal relations by econometric models and cross-spectral

methods Econometrica 37 424ndash438Granger CWJ (1980) Testing for causality a personal viewpoint Journal of Economic Dynamics

and Control 2 329ndash352Granger CWJ (1988) Some recent developments in a concept of causality Journal of Econometrics

39 199ndash211Granger CWJ amp P Newbold (1986) Forecasting Economic Time Series Academic Press 2nd

editionHausman J amp T Woutersen (2014) Estimating a semi-parametric duration model without specifying

heterogeneity Journal of Econometrics 178 114ndash131Haavelmo T (1943) The statistical implications of a system of simultaneous equations Econometrica

11 1ndash12Haavelmo T (1944) The probability approach in econometrics Econometrica 12 (supp) iii-vi

1ndash115

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 287

Hahn J (1998) On the role of the propensity score in efficient semiparametric estimation of averagetreatment effect Econometrica 66 315ndash331

Hahn J amp R Moon (2010) Panel data models with finite number of multiple equilibria EconometricTheory 26 863ndash881

Heckman JJ (2000) Causal parameters and policy analysis in economics a twentieth century retro-spective Quarterly Journal of Economics 115 45ndash97

Heckman JJ (2008) Econometric causality International Statistical Review 76 1ndash27Heckman JJ amp R Pinto (2015) Causal analysis after Haavelmo Econometric Theory 31

115ndash151Hirano K G Imbens amp G Ridder (2003) Efficient estimation of average treatment effects using the

estimated propensity score Econometrica 71 1161ndash1189Hoderlein S amp E Mammen (2007) Identification of marginal effects in nonseparable models without

monotonicity Econometrica 75 1513ndash1519Hoderlein S amp E Mammen (2009) Identification and estimation of local average derivatives in non-

separable models without monotonicity Econometrics Journal 12 1ndash25Hoderlein S amp H White (2012) Nonparametric identification in nonseparable panel data models with

generalized fixed effects Journal of Econometrics 168 300ndash314Holland P (1986) Statistics and causal inference Journal of the American Statistical Association 81

945ndash970Holtz-Eakin D W Newey amp HS Rosen (1988) Estimating vector autoregressions with panel data

Econometrica 56 1371ndash1396Hoover K (2008) Causality in economics and econometrics In S Durlauf and L Blume (eds)

The New Palgrave Dictionary of Economics 2nd edition Palgrave MacmillanHsiao C (2014) Analysis of Panel Data Cambridge University PressHuang M Y Sun amp H White (2016) A flexible nonparametric test for conditional independence

Econometric Theory forthcomingHurwicz L (1950) Generalization of the concept of identification In TC Koopmans (eds) Statistical

Inference in Dynamic Economic Models pp 238ndash257 John WileyIm K H Pesaran amp Y Shin (2003) Testing for unit roots in heterogeneous panels Journal of Econo-

metrics 115 53ndash74Imbens G (2000) The role of the propensity score in estimating dose-response functions Biometrika

87 706ndash710Imbens G (2004) Nonparametric estimation of average treatment effects under exogeneity a review

Review of Economics and Statistics 86 4ndash29Imbens G amp W Newey (2009) Identification and estimation of triangular simultaneous equations

models without additivity Econometrica 77 1481ndash1512Imbens G amp J Wooldridge (2009) Recent developments in the econometrics of program evaluation

Journal of Economic Literature 47 5ndash86Kasy M (2011) Identification in triangular systems using control functions Econometric Theory 27

663ndash671Koopmans T (1950) Statistical Inference in Dynamic Economic Models Cowles Commission Mono-

graph No 10 WileyKuersteiner G (2008) GrangerndashSims causality In S Durlauf and L Blume (Eds) The New Palgrave

Dictionary of Economics 2nd edition Palgrave MacmillanLechner M (2001) Identification and estimation of causal effect of multiple treatments under con-

ditional independence assumption In M Lechner F Pfeiller (eds) Econometric Evaluations ofLabour Market Policies Hildelberg Physical 43ndash58

Lechner M (2011) The relation of different concepts of causality used in time series and microecono-metrics Econometric Reviews 30 109ndash127

Lin C-C amp S Ng (2012) Estimation of panel data models with parameter heterogeneity when groupmembership is unknown Journal of Econometric Methods 1 42ndash55

Lindsay B (1995) Mixture Models Theory Geometry and Applications Hayward Institute for Math-ematical Statistics

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

288 XUN LU ET AL

Linton O amp P Gozalo (2014) Testing conditional independence restrictions Econometric Reviews33 523ndash552

Lu X amp L Su (2014) Determining the number of groups in latent panel structures Working paperSchool of Economics Singapore Management University

Matzkin R (2003) Nonparametric estimation of nonadditive random functions Econometrica 711339ndash1375

Matzkin R (2007) Nonparametric identification In JJ Heckman and EE Leamer (eds) The Hand-book of Econometrics 6B North-Holland

Mclachlan G amp K Basford (1988) Mixture Models Inference and Applications to Clustering MarcelDekker

Nair-Reichert U amp D Weinhold (2001) Causality tests for cross-country panels a new look at FDIand economic growth in developing countries Oxford Bulletin of Economics and Statistics 63153-171

Pearl J (2009) Causality Models Reasoning and Inference (2nd edition) Cambridge UniversityPress

Pearl J (2015) Trygve Haavelmo and the emergence of causal calculus Econometric Theory 31152ndash179

Rosenbaum P amp D Rubin (1983) The central role of the propensity score in observational studies forcausal effects Biometrika 70 41ndash55

Rubin D (1974) Estimating causal effects of treatments in randomized and non-randomized studiesJournal of Educational Psychology 66 688ndash701

Rubin D (2004) Direct and indirect causal effects via potential outcomes Scandinavian Journal ofStatistics 31 161ndash170

Sarafidis V amp N Weber (2015) A partially heterogeneous framework for analyzing panel data OxfordBulletin of Economics and Statistics 77 274ndash296

Sims C (1972) Money income and causality American Economic Review 62 540ndash552Song K (2009) Testing conditional independence via Rosenblatt transforms Annals of Statistics 37

4011ndash4015Strotz R amp H Wold (1960) Recursive vs nonrecursive systems An attempt at synthesis Economet-

rica 28 417ndash427Su L Z Shi amp PCB Phillips (2014) Identifying latent structures in panel data Working paper

Department of Economics Yale UniversitySu L amp H White (2007) A Consistent characteristic function-based test for conditional indepen-

dence Journal of Econometrics 141 807ndash834Su L amp H White (2008) A nonparametric Hellinger metric test for conditional independence Econo-

metric Theory 24 829ndash864Su L amp H White (2014) Testing conditional independence via empirical likelihood Journal of

Econometrics 182 27ndash44Sun Y (2005) Estimation and inference in panel structure models Working paper Department of

Economics UCSDWhite H amp K Chalak (2009) Settable systems An extension of Pearlrsquos causal model with optimiza-

tion equilibrium and learning Journal of Machine Learning Research 10 1759ndash1799White H amp K Chalak (2010) Testing a conditional form of exogeneity Economics Letters 109

88ndash90White H amp K Chalak (2013) Identification and identification failure for treatment effects using

structural systems Econometric Reviews 32 273ndash317White H K Chalak amp X Lu (2011) Linking Granger causality and the Pearl causal model with

settable systems Journal of Machine Learning Research Workshop and Conference Proceedings12 1ndash29

White H amp P Kennedy (2009) Retrospective estimation of causal effects through time In J Castleamp N Shephard (eds) The Methodology and Practice of Econometrics A Festschrift in Honour ofDavid F Hendry pp 59ndash87 Oxford University Press

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 289

White H amp X Lu (2010) Granger causality and dynamic structural systems Journal of FinancialEconometrics 8 193ndash243

White H amp X Lu (2011) Causal diagrams for treatment effect estimation with application to selectionof efficient covariates Review of Economics and Statistics 93 1453ndash1459

Wooldridge J (2005) Simple solutions to the initial conditions problem for dynamic nonlinear paneldata models with unobserved heterogeneity Journal of Applied Econometrics 20 39ndash54

Zellner A (1979) Causality and econometrics In K Brunner and AH Meltzer (eds) Three Aspectsof Policy and Policymaking Carnegie-Rochester Conference Series Vol 10 North-Holland 9ndash54

APPENDIX

Proof of Theorem 31 To show D perp U | X we establish that f (u | d x) = f (u | x)for all (ud x) We have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

= f (u x)

f (x)= f (u | x)

where the first equality in the second line holds since A2(i) (ie Dg | Xg sim D | X)implies fg(d x) = f (d | x) fg(x) say and the second holds by A2(i i) fg(u | d x) =fg(u | x) n

Proof of Theorem 32 Note that

f (u | x) = f (u x)

f (x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

We also have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | d x) fg(x)sum

gisinG pg fg(x)

The first equality in the second line follows from A2(i) Then f (u | d x) = f (u | x) forall (ud x) if and only ifsumgisinG

pg fg(u | d x) fg(x) =sumgisinG

pg fg(u | x) fg(x) for all (ud x)

By assumption (Ug | Dg Xg) g isinG is regular It follows immediately that for all g isinGDg perp Ug | Xg n

Proof of Proposition 33 Take any g isin G and let j belong to group Jg Dg rArrSYg means that Yj = r(Zj Uj bj ) Thus Uj perp Dj | X j implies that Yj perp Dj | X j byDawid (1979 Lemmas 41 and 42) This holds for all g isin G Applying Theorem 31gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

290 XUN LU ET AL

Proof of Theorem 34 We prove this by showing that Y perp D | X implies that Dg rArrSYg for all g isinG Given the assumed regularity Theorem 32 ensures that Y perp D | X impliesthat Yg perp Dg | Xg for all g isin G

Given (i) eq(33) and A2(ii) imply that for all j in group JgE

(Yj |X j = x Dj = d

) = r1(d z bj

)+ E[r2

(zUj bj

) | Dj = d X j = x]

= r1(d z bj

)+ E[r2

(zUj bj

) | X j = x]

Clearly r1(d z bj

)is constant in d if and only if E

(Yj |Dj = d X j = x

)is constant in

d Yj perp Dj | X j implies that E(Yj |Dj = d X j = x

)is constant in d which then implies

that r1(d z bj

)is constant in d Thus Dj rArrS Yj so Dg rArrS Yg

Given (i i) let ky = 1 without loss of generality Then by A2(ii) for all j in group JgFUj (u | x) = E

[1Uj le u

∣∣ X j = x]

= E[

1r(d zUj bj

) le r(d zu bj

)∣∣ X j = x]

= E[

1r(Dj Zj Uj bj

) le r(d zu bj

)∣∣ Dj = d X j = x]

= E[

1Yj le r

(d zu bj

)∣∣ Dj = d X j = x]

= Fj(r(d zu bj

) | d x)

By strict monotonicity of Fj (y | d x) in y

r(d zu bj

) = Fminus1j

(FUj (u | x) | d x

)

Now Yj perp Dj | X j implies that Fminus1j

(FUj (u | x) | d x

)is constant in d so r

(d zu bj

)is

also a constant function in d ie Dj rArrS Yj so Dg rArrS Yg As either (i) or (i i) holds for each g isin G we have Dg rArrS Yg for all g isin G n

Proof of Theorem 35 To show that A1 - A2 and the assumed regularity for Ygtogether with Dg rArrS(Xg) Yg for some g isin G imply D perp Y | X we let j in group Jgg isin G be such that Dj rArrS(X j ) Yj A2(i i) ensures that Dj perp Uj | X j so

P[Yj le y | Dj = d X j = x] =int

1r(d zu bj ) le y

d Fj (u | d x)

=int

1r(d zu bj ) le y

d Fj (u | x)

By assumption Dj rArrS(X j ) Yj so there exists ylowast isin supp(Yj ) such that there is no mea-

surable mapping f jylowast for whichint

1r(d zu bj ) le ylowastd Fj (u | d x) = f jylowast(x) ae-x Specifically this rules out the possibility thatint

1r(d zu bj ) le ylowast

d Fj (u | X j ) = f jylowast(X j ) equiv P[Yj le ylowast | X j ] as

Thus there exists ylowast isin supp(Yj ) such that

P[Yj le ylowast | Dj = d X j = x

] = P[Yj le ylowast | X j = x

]

for x in a set of positive probability so Yj perp Dj | X j Given the regularity assumed for YgTheorem 32 (for Yg) gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 291

Proof of Theorem 41 The proof is analogous to that of Theorem 31 n

Proof of Proposition 42 The proof is analogous to that of Proposition 33 n

Proof of Proposition 43 Under B3 Dgt rArrS Ygt and Dgtminus1 rArrS Ygtminus1 forallg isinGt cupGtminus1 imply that for j isin Jg there exist two measurable functions r1 and r2 such that

Yjt = r1(Uj0 bj0)+ r2(Y jtminus1 Zjt Ujt bjt

) j = 1 N t = 1

Hence

Yjt minusYjtminus1 = r2(Y jtminus1 Zjt Ujt bjt

)minus r2(Y jtminus2 Zjtminus1Ujtminus1 bjtminus1

)

At the sample level this means that

Yt minusYtminus1 = r2(Y tminus1 Zt Ut Bt

)minus r2(Y tminus2 Ztminus1Utminus1 Btminus1

)

Then B4 implies that

Dt perp (Xt Y tminus1Ut Utminus1 Bt Btminus1) | Xt Y tminus1

which further implies that Dt perp (Yt minusYtminus1) | Xt Y tminus1by Dawid (1979 Lemmas 41 and42) as (Yt minus Ytminus1) is a function of (Xt Y tminus1Ut Utminus1 Bt Btminus1) Then we have Dt perpYt | Xt Y tminus1 as Ytminus1 is a component of Y tminus1 n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

Page 20: 33 GRANGER CAUSALITY AND STRUCTURAL CAUSALITY IN … Lu_Su... · 2017. 2. 7. · 264 XUNLUETAL. natural experiments. The goal of this paper is to establish the analogous equiv-alence

282 XUN LU ET AL

The interpretation of structural eq(41) is the same as that for eq(31) ex-cept that here we have multiple time periods t = 12 The elements of(Yjt Djt Zjt Wjt Ujt

)and bjt rsquos can contain both time-varying and time-

invariant elements For now we do not distinguish these The elements of(Djt Zjt Wjt Ujt ) can contain lags of underlying variables By permittingonly finite histories Y jtminus1 Djt Zjt we restrict attention to Markov-type datagenerating processes this greatly simplifies the analysis and corresponds to thestructures usually considered in practice

As in the cross-section case we can have both structural heterogeneity anddistributional heterogeneity We specify the relevant structural and distributionalheterogeneities below

We again impose random sampling from the population For simplicity weassume all time periods can be observed for every individual so we observe abalanced panel Similar to the cross-section case we write

Yt = r (Y tminus1 Dt Zt Ut Bt ) t = 1 T

by suppressing the cross-sectional subscript i We assume that we observe data(Yt Dt Zt Wt ) t = 1 T and relevant lags prior to t = 1

41 Linking structural causality and Gminuscausality in dynamic panelsWe first define structural causality for the dynamic panel structure Let Djt be thesupport of Djt

DEFINITION 41 (Structural Causality in Dynamic Panels) Let j and t begiven If the function r(y jtminus1 middot zjt ujt bjt ) Djt rarr R

ky is constant on Djt

for all admissible y jtminus1 zjt and ujt then Djt does not structurally cause Yjt and we write Djt rArrS Yjt Otherwise Djt structurally causes Yjt and we writeDjt rArrS Yjt

We let X jt = (Z primejt W prime

jt )prime denote the covariates at time t WL consider

retrospective conditional exogeneity and retrospective Gminuscausality (White andKennedy 2009) We also allow this For this we letX jt denote a covariate his-tory that may contain X jt and lags or leads of X jt as in Wooldridge (2005 p 41)or WL ie for some m isin N+

X jt = SX(X jtminusm X jt X jt+m

) t = 12

where SX is a given selection matrix We assume

X jt X jtminusm subeX jt and

allow X jt also to contain leads of X jt When Xjt contains leads we have theretrospective case as in WL We denote randomly sampled covariates X t Forsimplicity we also assume that all needed elements of X t are observable Tem-poral precedence plays an important role in panel data In particular comparedwith the cross-sectional case we now have a richer set of covariates (includingboth leads and lags) to choose from

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 283

To provide a link between structural causality and Gminuscausality for dynamicpanels we condition on not only covariates X t but also lags of Yt For thiswe denote Y jtminus1 = (Yjtminusmy Yjtminus1) where my isin N+ and my ge y ie

Y jtminus1 sube Y jtminus1 Similarly we let Y tminus1 be the randomly sampled Y jtminus1Similar to the cross-section case for each t we define population group Jgt

g isin 1 t as a collection of population units having identical bjt and iden-tical distribution of (Ujt Djt ) |X jt Y jtminus1 We also let Gt = 1 t Forgiven t we write Dgt rArrS Ygt to denote Djt rArrS Yjt for all j in group Jgt g isin Gt

Next we impose an exogeneity assumption analogous to A2

Assumption B2 (Heterogeneous Dynamic Panel Exogeneity) Let t be givenFor all g isin Gt (i) Dgt |Xgt Y gtminus1 sim Dt |X t Y tminus1 and (i i) Dgt perp Ugt |Xgt Y gtminus1

Here as before for all g isin Gt the conditional distributions of Dgt given(Xgt Y gtminus1) are assumed to be identical and we write the common distributionas Dt |X t Y tminus1

The analog of Theorem 31 relating population and sample conditional exo-geneity is

THEOREM 41 Suppose B1ndashB2(i) hold and let t be given If B2(i i) holdsthen Dt perp Ut |X t Y tminus1

Proposition 42 provides a link between structural noncausality and Gminusnon-causality

PROPOSITION 42 Suppose that B1 and B2 hold Let t be given Thenstructural noncausality (Dgt rArrS Ygt forallg isin Gt ) implies that Dt does notGminuscause Yt wrtX t Y tminus1 ie Dt perp Yt |X t Y tminus1

We can also define structural causality with positive probability for dynamicstructures and show that structural causality and Gminuscausality are ldquoessentiallyrdquoequivalent under the key conditional exogeneity assumption For brevity we omitthe details

So far we have not carefully distinguished between time-varying and time-invariant elements of unobservable bjt and Ujt Below we briefly discuss thecase where we allow time-invariant components of Ujt and the randomly sampledbjt to be arbitrarily dependent or correlated with the cause of interest Djt For this we denote Ujt equiv (Ujt Uj0) where Ujt is time varying and Uj0time-invariant Similarly we denote bjt equiv (bjt bj0) The randomly sampledUjt Uj0 bjt bj0 are denoted as Ut U0 Bt B0 respectively In princi-ple we want to remove the time-invariant Uj0 and bj0 using eg the firstdifference For this it is convenient to impose a separable assumption on thestructural equation9 We also impose a conditional exogeneity assumption at thesample level

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

284 XUN LU ET AL

Assumption B3 Suppose that B1 holds with

Yjt = r1(Djt Uj0 bj0

)+r2(Y jtminus1 Djt Zjt Ujt bjt

) j = 1 N t = 1

where r1 and r2 are two unknown ky times1 measurable functions

Assumption B4 Suppose that for given t Dt perp (Ut Utminus1 Bt Btminus1

) |Xt Y tminus1

B4 is a sample-level assumption which can be supported by a correspond-ing assumption at the heterogeneous population level Note that in general B4is a strong assumption as we condition on Y tminus1 which is a function of bothDtminus1 and Utminus1 Nevertheless this assumption is plausible under the null of struc-tural noncausality as Y jtminus1 is a constant function of Djtminus1 under the nullOne simple example is that bjt rsquos are constants over j and t and Djt infint=1 perpYjt 0

t=1minusyUjt infint=1X jt infint=1 Under the null of structural noncausality for

all t Y jtminus1 is a function of Yjs0s=1minusy

Ujsts=1X jst

s=1 Then it is easyto show that in this case B4 is satisfied Certainly it will be interesting to relaxB4 but we leave this for future research

PROPOSITION 43 Suppose that B1ndashB4 hold Suppose that Dgt rArrS Ygtand Dgtminus1 rArrS Ygtminus1 forallg isin Gt cupGtminus1 Then Dt perp Yt | Xt Y tminus1

Proposition 43 suggests that when there are time-invariant components in theunobservables we can test for structural noncausality by testing for Gminusnon-causality Dt perp Yt | Xt Y tminus1 under the conditional exogeneity assumption B4

42 Testing for Gminuscausality and structural causality in panel dataAs shown above we can perform a Gminusnon causality test to test for structuralnoncausality Testing Gminusnon-causality in panel data is also simply a conditionalindependence test Here we focus on testing Dt perp Yt | Xt Y tminus1

First assume that the joint distributions of (Dt Yt Xt ) are identical over t In this case we can pool the data and implement a conditional independence testas discussed in Section 36

Second suppose that the joint distributions of (Dt Yt Xt ) are different overtime t In this case for each time period t using the cross-section data we canimplement a conditional independence test for Dt perp Yt | (Xt Y tminus1) and obtain atest statistic sayWt Thus we have T test statistics W1W2 WT and eachone can be used to test for structural noncausality for each t We may also want totest the hypothesis that for all t Djt does not structurally cause Yjt For this wecan construct our test statistic by taking the average of the T test statistics ieW equiv 1

T

sumTt=1Wt as in Im Pesaran and Shin (2003) and Dumitrescu and Hurlin

(2012) Other forms of ldquoaveragesrdquo are possible and special care is needed to takeinto account the dependence structure over time

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 285

5 CONCLUSION

This paper provides direct links between Granger causality and structural causal-ity in cross-section and panel data We extend Granger causality to cross-section and panel data and give a natural definition of structural causality inheterogeneous populations We show that under the key conditional exogeneityassumption Granger causality is essentially equivalent to structural causality incross-section and panel data Similar to the results in White and Lu (2010) fortime-series data our results here should enable researchers to avoid the misuse ofGranger causality and to establish the desired structural causal relation

NOTES

1 Strictly speaking if the conditional distributions are parameterized or treated as infinite-dimensional parameters then distributional heterogeneity may also be thought of as a special caseof structural heterogeneity

2 The classification is certainly overly simplistic We ignore the early literature on the philosophyof causality (for a review see eg Holland 1986 and Hoover 2008) We also ignore the litera-ture on causality discussed in other disciplines notably in machine learning (see eg Pearl 20092015 White and Chalak 2009 and White Chalak and Lu 2011) In particular Heckman and Pinto(2015) compare Haavelmorsquos structural framework of causality with the framework of Directed AcyclicGraphs (DAG) studied in the Bayesian network

3 Although Uj is typically called ldquounobservablerdquo it is better to view its elements as variables thatwill be omitted from empirical analysis This may be because they are unobservable it may also bebecause the researcher has purposefully or inadvertantly neglected them

4 For notational simplicity we suppress the observation subscript i5 Using our notation Holland considers the case where D perp (XU ) which implies our condi-

tional exogeneity assumption D perp U | X below6 See eg Altonji and Matzkin (2005) Hoderlein and Mammen (2007 2009) Imbens and Newey

(2009) White and Lu (2011) and White and Chalak (2013)7 AT T = E(Y (1)minusY (0) | D = 1) = E(U minus0 | D = 1) = E(U ) = 0

8 The conditional density fg(d | x) can also be interpreted as Imbensrsquos (2000) generalized propen-sity score

9 There is a recent literature which allows time-invariant unobservables to be arbitrarily correlatedwith causes of interest in the general nonseparable models (see eg Evdokimov 2010) We leave theproblem of linking G-causality and structural causality in this general setting for future research

REFERENCES

Altonji J amp R Matzkin (2005) Cross section and panel data estimators for nonseparable models withendogenous regressors Econometrica 73 1053ndash1102

Angrist J K Graddy amp G Imbens (2000) The interpretation of instrumental variables estimatorsin simultaneous equations models with an application to the demand for fish Review of EconomicStudies 67 499ndash527

Angrist J O Jorda amp G Kuersteiner (2013) Semiparametric estimates of monetary policy effectsstring theory revisited NBER Working Paper 19355

Angrist J amp G Kuersteiner (2004) Semiparametric causality tests using the policy propensity scoreNBER Working Paper No 10975

Angrist J amp G Kuersteiner (2011) Causal effects of monetary shocks semiparametric conditionalindependence tests with a multinomial propensity score Review of Economics and Statistics 93725ndash747

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

286 XUN LU ET AL

Angrist J amp J Pischke (2009) Mostly Harmless Econometrics An Empiricistrsquos Companion PrincetonUniversity Press

Bester CA amp CB Hansen (2016) Grouped effects estimators in fixed effects models Journal ofEconometrics 190 197ndash208

Bonhomme S amp E Manresa (2015) Grouped patterns of heterogeneity in panel data Econometrica83 1147ndash1184

Browning M amp JM Carro (2007) Heterogeneity and microeconometrics modelling In R BlundellW K Newey and T Persson (Eds) Advances in Economics and Econometrics Theory andApplications Ninth World Congress of the Econometric Society Volume 3 pp 45ndash74 CambridgeUniversity Press

Browning M amp JM Carro (2010) Heterogeneity in dynamic discrete choice models EconometricsJournal 13 1ndash39

Burda M M Harding amp J Hausman (2015) A Bayesian semi-parametric competing risk model withunobserved heterogeneity Journal of Applied Econometrics 30 353ndash376

Chalak K amp H White (2011) An extended class of instrumental variables for the estimation of causaleffects Canadian Journal of Economics 44 1ndash51

Chalak K amp H White (2012) Causality conditional independence and graphical separation insettable systems Neural Computation 24 1611ndash1668

Chamberlain G (1982) The general equivalence of Granger and Sims causality Econometrica 50569ndash581

Chamberlain G (1984) Panel data In Z Griliches amp M Intriligator (eds) Handbook of Economet-rics vol2 ch 22 pp1247ndash1318 Elsevier

Chesher A (2003) Identification in nonseparable models Econometrica 71 1405ndash1441Chesher A (2005) Nonparametric identification under discrete variation Econometrica 73

1525ndash1550Dawid AP (1979) Conditional independence in statistical theory Journal of the Royal Statistical

Society Series B 41 1ndash31Deb P amp PK Trivedi (2013) Finite mixture for panels with fixed effects Journal of Econometric

Methods 2 35ndash51Delgado M amp W Gonzalez-Manteiga (2001) Significance testing in nonparametric regression based

on the bootstrap Annals of Statistics 29 1469ndash1507Dumitrescu E-I amp C Hurlin (2012) Testing for Granger non-causality in heterogeneous panels

Economic Modelling 29 1450ndash1460Evdokimov K (2010) Identification and estimation of a nonparametric panel data model with unob-

served heterogeneity Working paper Department of Economics Princeton UniversityFlorens JP (2003) Some technical issues in defining causality Journal of Econometrics 112

127ndash128Florens JP amp D Fougere (1996) Non-causality in continuous time Econometrica 64 1195ndash1212Florens JP amp M Mouchart (1982) A note on non-causality Econometrica 50 583ndash591Granger CWJ (1969) Investigating causal relations by econometric models and cross-spectral

methods Econometrica 37 424ndash438Granger CWJ (1980) Testing for causality a personal viewpoint Journal of Economic Dynamics

and Control 2 329ndash352Granger CWJ (1988) Some recent developments in a concept of causality Journal of Econometrics

39 199ndash211Granger CWJ amp P Newbold (1986) Forecasting Economic Time Series Academic Press 2nd

editionHausman J amp T Woutersen (2014) Estimating a semi-parametric duration model without specifying

heterogeneity Journal of Econometrics 178 114ndash131Haavelmo T (1943) The statistical implications of a system of simultaneous equations Econometrica

11 1ndash12Haavelmo T (1944) The probability approach in econometrics Econometrica 12 (supp) iii-vi

1ndash115

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 287

Hahn J (1998) On the role of the propensity score in efficient semiparametric estimation of averagetreatment effect Econometrica 66 315ndash331

Hahn J amp R Moon (2010) Panel data models with finite number of multiple equilibria EconometricTheory 26 863ndash881

Heckman JJ (2000) Causal parameters and policy analysis in economics a twentieth century retro-spective Quarterly Journal of Economics 115 45ndash97

Heckman JJ (2008) Econometric causality International Statistical Review 76 1ndash27Heckman JJ amp R Pinto (2015) Causal analysis after Haavelmo Econometric Theory 31

115ndash151Hirano K G Imbens amp G Ridder (2003) Efficient estimation of average treatment effects using the

estimated propensity score Econometrica 71 1161ndash1189Hoderlein S amp E Mammen (2007) Identification of marginal effects in nonseparable models without

monotonicity Econometrica 75 1513ndash1519Hoderlein S amp E Mammen (2009) Identification and estimation of local average derivatives in non-

separable models without monotonicity Econometrics Journal 12 1ndash25Hoderlein S amp H White (2012) Nonparametric identification in nonseparable panel data models with

generalized fixed effects Journal of Econometrics 168 300ndash314Holland P (1986) Statistics and causal inference Journal of the American Statistical Association 81

945ndash970Holtz-Eakin D W Newey amp HS Rosen (1988) Estimating vector autoregressions with panel data

Econometrica 56 1371ndash1396Hoover K (2008) Causality in economics and econometrics In S Durlauf and L Blume (eds)

The New Palgrave Dictionary of Economics 2nd edition Palgrave MacmillanHsiao C (2014) Analysis of Panel Data Cambridge University PressHuang M Y Sun amp H White (2016) A flexible nonparametric test for conditional independence

Econometric Theory forthcomingHurwicz L (1950) Generalization of the concept of identification In TC Koopmans (eds) Statistical

Inference in Dynamic Economic Models pp 238ndash257 John WileyIm K H Pesaran amp Y Shin (2003) Testing for unit roots in heterogeneous panels Journal of Econo-

metrics 115 53ndash74Imbens G (2000) The role of the propensity score in estimating dose-response functions Biometrika

87 706ndash710Imbens G (2004) Nonparametric estimation of average treatment effects under exogeneity a review

Review of Economics and Statistics 86 4ndash29Imbens G amp W Newey (2009) Identification and estimation of triangular simultaneous equations

models without additivity Econometrica 77 1481ndash1512Imbens G amp J Wooldridge (2009) Recent developments in the econometrics of program evaluation

Journal of Economic Literature 47 5ndash86Kasy M (2011) Identification in triangular systems using control functions Econometric Theory 27

663ndash671Koopmans T (1950) Statistical Inference in Dynamic Economic Models Cowles Commission Mono-

graph No 10 WileyKuersteiner G (2008) GrangerndashSims causality In S Durlauf and L Blume (Eds) The New Palgrave

Dictionary of Economics 2nd edition Palgrave MacmillanLechner M (2001) Identification and estimation of causal effect of multiple treatments under con-

ditional independence assumption In M Lechner F Pfeiller (eds) Econometric Evaluations ofLabour Market Policies Hildelberg Physical 43ndash58

Lechner M (2011) The relation of different concepts of causality used in time series and microecono-metrics Econometric Reviews 30 109ndash127

Lin C-C amp S Ng (2012) Estimation of panel data models with parameter heterogeneity when groupmembership is unknown Journal of Econometric Methods 1 42ndash55

Lindsay B (1995) Mixture Models Theory Geometry and Applications Hayward Institute for Math-ematical Statistics

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

288 XUN LU ET AL

Linton O amp P Gozalo (2014) Testing conditional independence restrictions Econometric Reviews33 523ndash552

Lu X amp L Su (2014) Determining the number of groups in latent panel structures Working paperSchool of Economics Singapore Management University

Matzkin R (2003) Nonparametric estimation of nonadditive random functions Econometrica 711339ndash1375

Matzkin R (2007) Nonparametric identification In JJ Heckman and EE Leamer (eds) The Hand-book of Econometrics 6B North-Holland

Mclachlan G amp K Basford (1988) Mixture Models Inference and Applications to Clustering MarcelDekker

Nair-Reichert U amp D Weinhold (2001) Causality tests for cross-country panels a new look at FDIand economic growth in developing countries Oxford Bulletin of Economics and Statistics 63153-171

Pearl J (2009) Causality Models Reasoning and Inference (2nd edition) Cambridge UniversityPress

Pearl J (2015) Trygve Haavelmo and the emergence of causal calculus Econometric Theory 31152ndash179

Rosenbaum P amp D Rubin (1983) The central role of the propensity score in observational studies forcausal effects Biometrika 70 41ndash55

Rubin D (1974) Estimating causal effects of treatments in randomized and non-randomized studiesJournal of Educational Psychology 66 688ndash701

Rubin D (2004) Direct and indirect causal effects via potential outcomes Scandinavian Journal ofStatistics 31 161ndash170

Sarafidis V amp N Weber (2015) A partially heterogeneous framework for analyzing panel data OxfordBulletin of Economics and Statistics 77 274ndash296

Sims C (1972) Money income and causality American Economic Review 62 540ndash552Song K (2009) Testing conditional independence via Rosenblatt transforms Annals of Statistics 37

4011ndash4015Strotz R amp H Wold (1960) Recursive vs nonrecursive systems An attempt at synthesis Economet-

rica 28 417ndash427Su L Z Shi amp PCB Phillips (2014) Identifying latent structures in panel data Working paper

Department of Economics Yale UniversitySu L amp H White (2007) A Consistent characteristic function-based test for conditional indepen-

dence Journal of Econometrics 141 807ndash834Su L amp H White (2008) A nonparametric Hellinger metric test for conditional independence Econo-

metric Theory 24 829ndash864Su L amp H White (2014) Testing conditional independence via empirical likelihood Journal of

Econometrics 182 27ndash44Sun Y (2005) Estimation and inference in panel structure models Working paper Department of

Economics UCSDWhite H amp K Chalak (2009) Settable systems An extension of Pearlrsquos causal model with optimiza-

tion equilibrium and learning Journal of Machine Learning Research 10 1759ndash1799White H amp K Chalak (2010) Testing a conditional form of exogeneity Economics Letters 109

88ndash90White H amp K Chalak (2013) Identification and identification failure for treatment effects using

structural systems Econometric Reviews 32 273ndash317White H K Chalak amp X Lu (2011) Linking Granger causality and the Pearl causal model with

settable systems Journal of Machine Learning Research Workshop and Conference Proceedings12 1ndash29

White H amp P Kennedy (2009) Retrospective estimation of causal effects through time In J Castleamp N Shephard (eds) The Methodology and Practice of Econometrics A Festschrift in Honour ofDavid F Hendry pp 59ndash87 Oxford University Press

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 289

White H amp X Lu (2010) Granger causality and dynamic structural systems Journal of FinancialEconometrics 8 193ndash243

White H amp X Lu (2011) Causal diagrams for treatment effect estimation with application to selectionof efficient covariates Review of Economics and Statistics 93 1453ndash1459

Wooldridge J (2005) Simple solutions to the initial conditions problem for dynamic nonlinear paneldata models with unobserved heterogeneity Journal of Applied Econometrics 20 39ndash54

Zellner A (1979) Causality and econometrics In K Brunner and AH Meltzer (eds) Three Aspectsof Policy and Policymaking Carnegie-Rochester Conference Series Vol 10 North-Holland 9ndash54

APPENDIX

Proof of Theorem 31 To show D perp U | X we establish that f (u | d x) = f (u | x)for all (ud x) We have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

= f (u x)

f (x)= f (u | x)

where the first equality in the second line holds since A2(i) (ie Dg | Xg sim D | X)implies fg(d x) = f (d | x) fg(x) say and the second holds by A2(i i) fg(u | d x) =fg(u | x) n

Proof of Theorem 32 Note that

f (u | x) = f (u x)

f (x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

We also have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | d x) fg(x)sum

gisinG pg fg(x)

The first equality in the second line follows from A2(i) Then f (u | d x) = f (u | x) forall (ud x) if and only ifsumgisinG

pg fg(u | d x) fg(x) =sumgisinG

pg fg(u | x) fg(x) for all (ud x)

By assumption (Ug | Dg Xg) g isinG is regular It follows immediately that for all g isinGDg perp Ug | Xg n

Proof of Proposition 33 Take any g isin G and let j belong to group Jg Dg rArrSYg means that Yj = r(Zj Uj bj ) Thus Uj perp Dj | X j implies that Yj perp Dj | X j byDawid (1979 Lemmas 41 and 42) This holds for all g isin G Applying Theorem 31gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

290 XUN LU ET AL

Proof of Theorem 34 We prove this by showing that Y perp D | X implies that Dg rArrSYg for all g isinG Given the assumed regularity Theorem 32 ensures that Y perp D | X impliesthat Yg perp Dg | Xg for all g isin G

Given (i) eq(33) and A2(ii) imply that for all j in group JgE

(Yj |X j = x Dj = d

) = r1(d z bj

)+ E[r2

(zUj bj

) | Dj = d X j = x]

= r1(d z bj

)+ E[r2

(zUj bj

) | X j = x]

Clearly r1(d z bj

)is constant in d if and only if E

(Yj |Dj = d X j = x

)is constant in

d Yj perp Dj | X j implies that E(Yj |Dj = d X j = x

)is constant in d which then implies

that r1(d z bj

)is constant in d Thus Dj rArrS Yj so Dg rArrS Yg

Given (i i) let ky = 1 without loss of generality Then by A2(ii) for all j in group JgFUj (u | x) = E

[1Uj le u

∣∣ X j = x]

= E[

1r(d zUj bj

) le r(d zu bj

)∣∣ X j = x]

= E[

1r(Dj Zj Uj bj

) le r(d zu bj

)∣∣ Dj = d X j = x]

= E[

1Yj le r

(d zu bj

)∣∣ Dj = d X j = x]

= Fj(r(d zu bj

) | d x)

By strict monotonicity of Fj (y | d x) in y

r(d zu bj

) = Fminus1j

(FUj (u | x) | d x

)

Now Yj perp Dj | X j implies that Fminus1j

(FUj (u | x) | d x

)is constant in d so r

(d zu bj

)is

also a constant function in d ie Dj rArrS Yj so Dg rArrS Yg As either (i) or (i i) holds for each g isin G we have Dg rArrS Yg for all g isin G n

Proof of Theorem 35 To show that A1 - A2 and the assumed regularity for Ygtogether with Dg rArrS(Xg) Yg for some g isin G imply D perp Y | X we let j in group Jgg isin G be such that Dj rArrS(X j ) Yj A2(i i) ensures that Dj perp Uj | X j so

P[Yj le y | Dj = d X j = x] =int

1r(d zu bj ) le y

d Fj (u | d x)

=int

1r(d zu bj ) le y

d Fj (u | x)

By assumption Dj rArrS(X j ) Yj so there exists ylowast isin supp(Yj ) such that there is no mea-

surable mapping f jylowast for whichint

1r(d zu bj ) le ylowastd Fj (u | d x) = f jylowast(x) ae-x Specifically this rules out the possibility thatint

1r(d zu bj ) le ylowast

d Fj (u | X j ) = f jylowast(X j ) equiv P[Yj le ylowast | X j ] as

Thus there exists ylowast isin supp(Yj ) such that

P[Yj le ylowast | Dj = d X j = x

] = P[Yj le ylowast | X j = x

]

for x in a set of positive probability so Yj perp Dj | X j Given the regularity assumed for YgTheorem 32 (for Yg) gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 291

Proof of Theorem 41 The proof is analogous to that of Theorem 31 n

Proof of Proposition 42 The proof is analogous to that of Proposition 33 n

Proof of Proposition 43 Under B3 Dgt rArrS Ygt and Dgtminus1 rArrS Ygtminus1 forallg isinGt cupGtminus1 imply that for j isin Jg there exist two measurable functions r1 and r2 such that

Yjt = r1(Uj0 bj0)+ r2(Y jtminus1 Zjt Ujt bjt

) j = 1 N t = 1

Hence

Yjt minusYjtminus1 = r2(Y jtminus1 Zjt Ujt bjt

)minus r2(Y jtminus2 Zjtminus1Ujtminus1 bjtminus1

)

At the sample level this means that

Yt minusYtminus1 = r2(Y tminus1 Zt Ut Bt

)minus r2(Y tminus2 Ztminus1Utminus1 Btminus1

)

Then B4 implies that

Dt perp (Xt Y tminus1Ut Utminus1 Bt Btminus1) | Xt Y tminus1

which further implies that Dt perp (Yt minusYtminus1) | Xt Y tminus1by Dawid (1979 Lemmas 41 and42) as (Yt minus Ytminus1) is a function of (Xt Y tminus1Ut Utminus1 Bt Btminus1) Then we have Dt perpYt | Xt Y tminus1 as Ytminus1 is a component of Y tminus1 n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

Page 21: 33 GRANGER CAUSALITY AND STRUCTURAL CAUSALITY IN … Lu_Su... · 2017. 2. 7. · 264 XUNLUETAL. natural experiments. The goal of this paper is to establish the analogous equiv-alence

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 283

To provide a link between structural causality and Gminuscausality for dynamicpanels we condition on not only covariates X t but also lags of Yt For thiswe denote Y jtminus1 = (Yjtminusmy Yjtminus1) where my isin N+ and my ge y ie

Y jtminus1 sube Y jtminus1 Similarly we let Y tminus1 be the randomly sampled Y jtminus1Similar to the cross-section case for each t we define population group Jgt

g isin 1 t as a collection of population units having identical bjt and iden-tical distribution of (Ujt Djt ) |X jt Y jtminus1 We also let Gt = 1 t Forgiven t we write Dgt rArrS Ygt to denote Djt rArrS Yjt for all j in group Jgt g isin Gt

Next we impose an exogeneity assumption analogous to A2

Assumption B2 (Heterogeneous Dynamic Panel Exogeneity) Let t be givenFor all g isin Gt (i) Dgt |Xgt Y gtminus1 sim Dt |X t Y tminus1 and (i i) Dgt perp Ugt |Xgt Y gtminus1

Here as before for all g isin Gt the conditional distributions of Dgt given(Xgt Y gtminus1) are assumed to be identical and we write the common distributionas Dt |X t Y tminus1

The analog of Theorem 31 relating population and sample conditional exo-geneity is

THEOREM 41 Suppose B1ndashB2(i) hold and let t be given If B2(i i) holdsthen Dt perp Ut |X t Y tminus1

Proposition 42 provides a link between structural noncausality and Gminusnon-causality

PROPOSITION 42 Suppose that B1 and B2 hold Let t be given Thenstructural noncausality (Dgt rArrS Ygt forallg isin Gt ) implies that Dt does notGminuscause Yt wrtX t Y tminus1 ie Dt perp Yt |X t Y tminus1

We can also define structural causality with positive probability for dynamicstructures and show that structural causality and Gminuscausality are ldquoessentiallyrdquoequivalent under the key conditional exogeneity assumption For brevity we omitthe details

So far we have not carefully distinguished between time-varying and time-invariant elements of unobservable bjt and Ujt Below we briefly discuss thecase where we allow time-invariant components of Ujt and the randomly sampledbjt to be arbitrarily dependent or correlated with the cause of interest Djt For this we denote Ujt equiv (Ujt Uj0) where Ujt is time varying and Uj0time-invariant Similarly we denote bjt equiv (bjt bj0) The randomly sampledUjt Uj0 bjt bj0 are denoted as Ut U0 Bt B0 respectively In princi-ple we want to remove the time-invariant Uj0 and bj0 using eg the firstdifference For this it is convenient to impose a separable assumption on thestructural equation9 We also impose a conditional exogeneity assumption at thesample level

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

284 XUN LU ET AL

Assumption B3 Suppose that B1 holds with

Yjt = r1(Djt Uj0 bj0

)+r2(Y jtminus1 Djt Zjt Ujt bjt

) j = 1 N t = 1

where r1 and r2 are two unknown ky times1 measurable functions

Assumption B4 Suppose that for given t Dt perp (Ut Utminus1 Bt Btminus1

) |Xt Y tminus1

B4 is a sample-level assumption which can be supported by a correspond-ing assumption at the heterogeneous population level Note that in general B4is a strong assumption as we condition on Y tminus1 which is a function of bothDtminus1 and Utminus1 Nevertheless this assumption is plausible under the null of struc-tural noncausality as Y jtminus1 is a constant function of Djtminus1 under the nullOne simple example is that bjt rsquos are constants over j and t and Djt infint=1 perpYjt 0

t=1minusyUjt infint=1X jt infint=1 Under the null of structural noncausality for

all t Y jtminus1 is a function of Yjs0s=1minusy

Ujsts=1X jst

s=1 Then it is easyto show that in this case B4 is satisfied Certainly it will be interesting to relaxB4 but we leave this for future research

PROPOSITION 43 Suppose that B1ndashB4 hold Suppose that Dgt rArrS Ygtand Dgtminus1 rArrS Ygtminus1 forallg isin Gt cupGtminus1 Then Dt perp Yt | Xt Y tminus1

Proposition 43 suggests that when there are time-invariant components in theunobservables we can test for structural noncausality by testing for Gminusnon-causality Dt perp Yt | Xt Y tminus1 under the conditional exogeneity assumption B4

42 Testing for Gminuscausality and structural causality in panel dataAs shown above we can perform a Gminusnon causality test to test for structuralnoncausality Testing Gminusnon-causality in panel data is also simply a conditionalindependence test Here we focus on testing Dt perp Yt | Xt Y tminus1

First assume that the joint distributions of (Dt Yt Xt ) are identical over t In this case we can pool the data and implement a conditional independence testas discussed in Section 36

Second suppose that the joint distributions of (Dt Yt Xt ) are different overtime t In this case for each time period t using the cross-section data we canimplement a conditional independence test for Dt perp Yt | (Xt Y tminus1) and obtain atest statistic sayWt Thus we have T test statistics W1W2 WT and eachone can be used to test for structural noncausality for each t We may also want totest the hypothesis that for all t Djt does not structurally cause Yjt For this wecan construct our test statistic by taking the average of the T test statistics ieW equiv 1

T

sumTt=1Wt as in Im Pesaran and Shin (2003) and Dumitrescu and Hurlin

(2012) Other forms of ldquoaveragesrdquo are possible and special care is needed to takeinto account the dependence structure over time

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 285

5 CONCLUSION

This paper provides direct links between Granger causality and structural causal-ity in cross-section and panel data We extend Granger causality to cross-section and panel data and give a natural definition of structural causality inheterogeneous populations We show that under the key conditional exogeneityassumption Granger causality is essentially equivalent to structural causality incross-section and panel data Similar to the results in White and Lu (2010) fortime-series data our results here should enable researchers to avoid the misuse ofGranger causality and to establish the desired structural causal relation

NOTES

1 Strictly speaking if the conditional distributions are parameterized or treated as infinite-dimensional parameters then distributional heterogeneity may also be thought of as a special caseof structural heterogeneity

2 The classification is certainly overly simplistic We ignore the early literature on the philosophyof causality (for a review see eg Holland 1986 and Hoover 2008) We also ignore the litera-ture on causality discussed in other disciplines notably in machine learning (see eg Pearl 20092015 White and Chalak 2009 and White Chalak and Lu 2011) In particular Heckman and Pinto(2015) compare Haavelmorsquos structural framework of causality with the framework of Directed AcyclicGraphs (DAG) studied in the Bayesian network

3 Although Uj is typically called ldquounobservablerdquo it is better to view its elements as variables thatwill be omitted from empirical analysis This may be because they are unobservable it may also bebecause the researcher has purposefully or inadvertantly neglected them

4 For notational simplicity we suppress the observation subscript i5 Using our notation Holland considers the case where D perp (XU ) which implies our condi-

tional exogeneity assumption D perp U | X below6 See eg Altonji and Matzkin (2005) Hoderlein and Mammen (2007 2009) Imbens and Newey

(2009) White and Lu (2011) and White and Chalak (2013)7 AT T = E(Y (1)minusY (0) | D = 1) = E(U minus0 | D = 1) = E(U ) = 0

8 The conditional density fg(d | x) can also be interpreted as Imbensrsquos (2000) generalized propen-sity score

9 There is a recent literature which allows time-invariant unobservables to be arbitrarily correlatedwith causes of interest in the general nonseparable models (see eg Evdokimov 2010) We leave theproblem of linking G-causality and structural causality in this general setting for future research

REFERENCES

Altonji J amp R Matzkin (2005) Cross section and panel data estimators for nonseparable models withendogenous regressors Econometrica 73 1053ndash1102

Angrist J K Graddy amp G Imbens (2000) The interpretation of instrumental variables estimatorsin simultaneous equations models with an application to the demand for fish Review of EconomicStudies 67 499ndash527

Angrist J O Jorda amp G Kuersteiner (2013) Semiparametric estimates of monetary policy effectsstring theory revisited NBER Working Paper 19355

Angrist J amp G Kuersteiner (2004) Semiparametric causality tests using the policy propensity scoreNBER Working Paper No 10975

Angrist J amp G Kuersteiner (2011) Causal effects of monetary shocks semiparametric conditionalindependence tests with a multinomial propensity score Review of Economics and Statistics 93725ndash747

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

286 XUN LU ET AL

Angrist J amp J Pischke (2009) Mostly Harmless Econometrics An Empiricistrsquos Companion PrincetonUniversity Press

Bester CA amp CB Hansen (2016) Grouped effects estimators in fixed effects models Journal ofEconometrics 190 197ndash208

Bonhomme S amp E Manresa (2015) Grouped patterns of heterogeneity in panel data Econometrica83 1147ndash1184

Browning M amp JM Carro (2007) Heterogeneity and microeconometrics modelling In R BlundellW K Newey and T Persson (Eds) Advances in Economics and Econometrics Theory andApplications Ninth World Congress of the Econometric Society Volume 3 pp 45ndash74 CambridgeUniversity Press

Browning M amp JM Carro (2010) Heterogeneity in dynamic discrete choice models EconometricsJournal 13 1ndash39

Burda M M Harding amp J Hausman (2015) A Bayesian semi-parametric competing risk model withunobserved heterogeneity Journal of Applied Econometrics 30 353ndash376

Chalak K amp H White (2011) An extended class of instrumental variables for the estimation of causaleffects Canadian Journal of Economics 44 1ndash51

Chalak K amp H White (2012) Causality conditional independence and graphical separation insettable systems Neural Computation 24 1611ndash1668

Chamberlain G (1982) The general equivalence of Granger and Sims causality Econometrica 50569ndash581

Chamberlain G (1984) Panel data In Z Griliches amp M Intriligator (eds) Handbook of Economet-rics vol2 ch 22 pp1247ndash1318 Elsevier

Chesher A (2003) Identification in nonseparable models Econometrica 71 1405ndash1441Chesher A (2005) Nonparametric identification under discrete variation Econometrica 73

1525ndash1550Dawid AP (1979) Conditional independence in statistical theory Journal of the Royal Statistical

Society Series B 41 1ndash31Deb P amp PK Trivedi (2013) Finite mixture for panels with fixed effects Journal of Econometric

Methods 2 35ndash51Delgado M amp W Gonzalez-Manteiga (2001) Significance testing in nonparametric regression based

on the bootstrap Annals of Statistics 29 1469ndash1507Dumitrescu E-I amp C Hurlin (2012) Testing for Granger non-causality in heterogeneous panels

Economic Modelling 29 1450ndash1460Evdokimov K (2010) Identification and estimation of a nonparametric panel data model with unob-

served heterogeneity Working paper Department of Economics Princeton UniversityFlorens JP (2003) Some technical issues in defining causality Journal of Econometrics 112

127ndash128Florens JP amp D Fougere (1996) Non-causality in continuous time Econometrica 64 1195ndash1212Florens JP amp M Mouchart (1982) A note on non-causality Econometrica 50 583ndash591Granger CWJ (1969) Investigating causal relations by econometric models and cross-spectral

methods Econometrica 37 424ndash438Granger CWJ (1980) Testing for causality a personal viewpoint Journal of Economic Dynamics

and Control 2 329ndash352Granger CWJ (1988) Some recent developments in a concept of causality Journal of Econometrics

39 199ndash211Granger CWJ amp P Newbold (1986) Forecasting Economic Time Series Academic Press 2nd

editionHausman J amp T Woutersen (2014) Estimating a semi-parametric duration model without specifying

heterogeneity Journal of Econometrics 178 114ndash131Haavelmo T (1943) The statistical implications of a system of simultaneous equations Econometrica

11 1ndash12Haavelmo T (1944) The probability approach in econometrics Econometrica 12 (supp) iii-vi

1ndash115

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 287

Hahn J (1998) On the role of the propensity score in efficient semiparametric estimation of averagetreatment effect Econometrica 66 315ndash331

Hahn J amp R Moon (2010) Panel data models with finite number of multiple equilibria EconometricTheory 26 863ndash881

Heckman JJ (2000) Causal parameters and policy analysis in economics a twentieth century retro-spective Quarterly Journal of Economics 115 45ndash97

Heckman JJ (2008) Econometric causality International Statistical Review 76 1ndash27Heckman JJ amp R Pinto (2015) Causal analysis after Haavelmo Econometric Theory 31

115ndash151Hirano K G Imbens amp G Ridder (2003) Efficient estimation of average treatment effects using the

estimated propensity score Econometrica 71 1161ndash1189Hoderlein S amp E Mammen (2007) Identification of marginal effects in nonseparable models without

monotonicity Econometrica 75 1513ndash1519Hoderlein S amp E Mammen (2009) Identification and estimation of local average derivatives in non-

separable models without monotonicity Econometrics Journal 12 1ndash25Hoderlein S amp H White (2012) Nonparametric identification in nonseparable panel data models with

generalized fixed effects Journal of Econometrics 168 300ndash314Holland P (1986) Statistics and causal inference Journal of the American Statistical Association 81

945ndash970Holtz-Eakin D W Newey amp HS Rosen (1988) Estimating vector autoregressions with panel data

Econometrica 56 1371ndash1396Hoover K (2008) Causality in economics and econometrics In S Durlauf and L Blume (eds)

The New Palgrave Dictionary of Economics 2nd edition Palgrave MacmillanHsiao C (2014) Analysis of Panel Data Cambridge University PressHuang M Y Sun amp H White (2016) A flexible nonparametric test for conditional independence

Econometric Theory forthcomingHurwicz L (1950) Generalization of the concept of identification In TC Koopmans (eds) Statistical

Inference in Dynamic Economic Models pp 238ndash257 John WileyIm K H Pesaran amp Y Shin (2003) Testing for unit roots in heterogeneous panels Journal of Econo-

metrics 115 53ndash74Imbens G (2000) The role of the propensity score in estimating dose-response functions Biometrika

87 706ndash710Imbens G (2004) Nonparametric estimation of average treatment effects under exogeneity a review

Review of Economics and Statistics 86 4ndash29Imbens G amp W Newey (2009) Identification and estimation of triangular simultaneous equations

models without additivity Econometrica 77 1481ndash1512Imbens G amp J Wooldridge (2009) Recent developments in the econometrics of program evaluation

Journal of Economic Literature 47 5ndash86Kasy M (2011) Identification in triangular systems using control functions Econometric Theory 27

663ndash671Koopmans T (1950) Statistical Inference in Dynamic Economic Models Cowles Commission Mono-

graph No 10 WileyKuersteiner G (2008) GrangerndashSims causality In S Durlauf and L Blume (Eds) The New Palgrave

Dictionary of Economics 2nd edition Palgrave MacmillanLechner M (2001) Identification and estimation of causal effect of multiple treatments under con-

ditional independence assumption In M Lechner F Pfeiller (eds) Econometric Evaluations ofLabour Market Policies Hildelberg Physical 43ndash58

Lechner M (2011) The relation of different concepts of causality used in time series and microecono-metrics Econometric Reviews 30 109ndash127

Lin C-C amp S Ng (2012) Estimation of panel data models with parameter heterogeneity when groupmembership is unknown Journal of Econometric Methods 1 42ndash55

Lindsay B (1995) Mixture Models Theory Geometry and Applications Hayward Institute for Math-ematical Statistics

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

288 XUN LU ET AL

Linton O amp P Gozalo (2014) Testing conditional independence restrictions Econometric Reviews33 523ndash552

Lu X amp L Su (2014) Determining the number of groups in latent panel structures Working paperSchool of Economics Singapore Management University

Matzkin R (2003) Nonparametric estimation of nonadditive random functions Econometrica 711339ndash1375

Matzkin R (2007) Nonparametric identification In JJ Heckman and EE Leamer (eds) The Hand-book of Econometrics 6B North-Holland

Mclachlan G amp K Basford (1988) Mixture Models Inference and Applications to Clustering MarcelDekker

Nair-Reichert U amp D Weinhold (2001) Causality tests for cross-country panels a new look at FDIand economic growth in developing countries Oxford Bulletin of Economics and Statistics 63153-171

Pearl J (2009) Causality Models Reasoning and Inference (2nd edition) Cambridge UniversityPress

Pearl J (2015) Trygve Haavelmo and the emergence of causal calculus Econometric Theory 31152ndash179

Rosenbaum P amp D Rubin (1983) The central role of the propensity score in observational studies forcausal effects Biometrika 70 41ndash55

Rubin D (1974) Estimating causal effects of treatments in randomized and non-randomized studiesJournal of Educational Psychology 66 688ndash701

Rubin D (2004) Direct and indirect causal effects via potential outcomes Scandinavian Journal ofStatistics 31 161ndash170

Sarafidis V amp N Weber (2015) A partially heterogeneous framework for analyzing panel data OxfordBulletin of Economics and Statistics 77 274ndash296

Sims C (1972) Money income and causality American Economic Review 62 540ndash552Song K (2009) Testing conditional independence via Rosenblatt transforms Annals of Statistics 37

4011ndash4015Strotz R amp H Wold (1960) Recursive vs nonrecursive systems An attempt at synthesis Economet-

rica 28 417ndash427Su L Z Shi amp PCB Phillips (2014) Identifying latent structures in panel data Working paper

Department of Economics Yale UniversitySu L amp H White (2007) A Consistent characteristic function-based test for conditional indepen-

dence Journal of Econometrics 141 807ndash834Su L amp H White (2008) A nonparametric Hellinger metric test for conditional independence Econo-

metric Theory 24 829ndash864Su L amp H White (2014) Testing conditional independence via empirical likelihood Journal of

Econometrics 182 27ndash44Sun Y (2005) Estimation and inference in panel structure models Working paper Department of

Economics UCSDWhite H amp K Chalak (2009) Settable systems An extension of Pearlrsquos causal model with optimiza-

tion equilibrium and learning Journal of Machine Learning Research 10 1759ndash1799White H amp K Chalak (2010) Testing a conditional form of exogeneity Economics Letters 109

88ndash90White H amp K Chalak (2013) Identification and identification failure for treatment effects using

structural systems Econometric Reviews 32 273ndash317White H K Chalak amp X Lu (2011) Linking Granger causality and the Pearl causal model with

settable systems Journal of Machine Learning Research Workshop and Conference Proceedings12 1ndash29

White H amp P Kennedy (2009) Retrospective estimation of causal effects through time In J Castleamp N Shephard (eds) The Methodology and Practice of Econometrics A Festschrift in Honour ofDavid F Hendry pp 59ndash87 Oxford University Press

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 289

White H amp X Lu (2010) Granger causality and dynamic structural systems Journal of FinancialEconometrics 8 193ndash243

White H amp X Lu (2011) Causal diagrams for treatment effect estimation with application to selectionof efficient covariates Review of Economics and Statistics 93 1453ndash1459

Wooldridge J (2005) Simple solutions to the initial conditions problem for dynamic nonlinear paneldata models with unobserved heterogeneity Journal of Applied Econometrics 20 39ndash54

Zellner A (1979) Causality and econometrics In K Brunner and AH Meltzer (eds) Three Aspectsof Policy and Policymaking Carnegie-Rochester Conference Series Vol 10 North-Holland 9ndash54

APPENDIX

Proof of Theorem 31 To show D perp U | X we establish that f (u | d x) = f (u | x)for all (ud x) We have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

= f (u x)

f (x)= f (u | x)

where the first equality in the second line holds since A2(i) (ie Dg | Xg sim D | X)implies fg(d x) = f (d | x) fg(x) say and the second holds by A2(i i) fg(u | d x) =fg(u | x) n

Proof of Theorem 32 Note that

f (u | x) = f (u x)

f (x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

We also have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | d x) fg(x)sum

gisinG pg fg(x)

The first equality in the second line follows from A2(i) Then f (u | d x) = f (u | x) forall (ud x) if and only ifsumgisinG

pg fg(u | d x) fg(x) =sumgisinG

pg fg(u | x) fg(x) for all (ud x)

By assumption (Ug | Dg Xg) g isinG is regular It follows immediately that for all g isinGDg perp Ug | Xg n

Proof of Proposition 33 Take any g isin G and let j belong to group Jg Dg rArrSYg means that Yj = r(Zj Uj bj ) Thus Uj perp Dj | X j implies that Yj perp Dj | X j byDawid (1979 Lemmas 41 and 42) This holds for all g isin G Applying Theorem 31gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

290 XUN LU ET AL

Proof of Theorem 34 We prove this by showing that Y perp D | X implies that Dg rArrSYg for all g isinG Given the assumed regularity Theorem 32 ensures that Y perp D | X impliesthat Yg perp Dg | Xg for all g isin G

Given (i) eq(33) and A2(ii) imply that for all j in group JgE

(Yj |X j = x Dj = d

) = r1(d z bj

)+ E[r2

(zUj bj

) | Dj = d X j = x]

= r1(d z bj

)+ E[r2

(zUj bj

) | X j = x]

Clearly r1(d z bj

)is constant in d if and only if E

(Yj |Dj = d X j = x

)is constant in

d Yj perp Dj | X j implies that E(Yj |Dj = d X j = x

)is constant in d which then implies

that r1(d z bj

)is constant in d Thus Dj rArrS Yj so Dg rArrS Yg

Given (i i) let ky = 1 without loss of generality Then by A2(ii) for all j in group JgFUj (u | x) = E

[1Uj le u

∣∣ X j = x]

= E[

1r(d zUj bj

) le r(d zu bj

)∣∣ X j = x]

= E[

1r(Dj Zj Uj bj

) le r(d zu bj

)∣∣ Dj = d X j = x]

= E[

1Yj le r

(d zu bj

)∣∣ Dj = d X j = x]

= Fj(r(d zu bj

) | d x)

By strict monotonicity of Fj (y | d x) in y

r(d zu bj

) = Fminus1j

(FUj (u | x) | d x

)

Now Yj perp Dj | X j implies that Fminus1j

(FUj (u | x) | d x

)is constant in d so r

(d zu bj

)is

also a constant function in d ie Dj rArrS Yj so Dg rArrS Yg As either (i) or (i i) holds for each g isin G we have Dg rArrS Yg for all g isin G n

Proof of Theorem 35 To show that A1 - A2 and the assumed regularity for Ygtogether with Dg rArrS(Xg) Yg for some g isin G imply D perp Y | X we let j in group Jgg isin G be such that Dj rArrS(X j ) Yj A2(i i) ensures that Dj perp Uj | X j so

P[Yj le y | Dj = d X j = x] =int

1r(d zu bj ) le y

d Fj (u | d x)

=int

1r(d zu bj ) le y

d Fj (u | x)

By assumption Dj rArrS(X j ) Yj so there exists ylowast isin supp(Yj ) such that there is no mea-

surable mapping f jylowast for whichint

1r(d zu bj ) le ylowastd Fj (u | d x) = f jylowast(x) ae-x Specifically this rules out the possibility thatint

1r(d zu bj ) le ylowast

d Fj (u | X j ) = f jylowast(X j ) equiv P[Yj le ylowast | X j ] as

Thus there exists ylowast isin supp(Yj ) such that

P[Yj le ylowast | Dj = d X j = x

] = P[Yj le ylowast | X j = x

]

for x in a set of positive probability so Yj perp Dj | X j Given the regularity assumed for YgTheorem 32 (for Yg) gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 291

Proof of Theorem 41 The proof is analogous to that of Theorem 31 n

Proof of Proposition 42 The proof is analogous to that of Proposition 33 n

Proof of Proposition 43 Under B3 Dgt rArrS Ygt and Dgtminus1 rArrS Ygtminus1 forallg isinGt cupGtminus1 imply that for j isin Jg there exist two measurable functions r1 and r2 such that

Yjt = r1(Uj0 bj0)+ r2(Y jtminus1 Zjt Ujt bjt

) j = 1 N t = 1

Hence

Yjt minusYjtminus1 = r2(Y jtminus1 Zjt Ujt bjt

)minus r2(Y jtminus2 Zjtminus1Ujtminus1 bjtminus1

)

At the sample level this means that

Yt minusYtminus1 = r2(Y tminus1 Zt Ut Bt

)minus r2(Y tminus2 Ztminus1Utminus1 Btminus1

)

Then B4 implies that

Dt perp (Xt Y tminus1Ut Utminus1 Bt Btminus1) | Xt Y tminus1

which further implies that Dt perp (Yt minusYtminus1) | Xt Y tminus1by Dawid (1979 Lemmas 41 and42) as (Yt minus Ytminus1) is a function of (Xt Y tminus1Ut Utminus1 Bt Btminus1) Then we have Dt perpYt | Xt Y tminus1 as Ytminus1 is a component of Y tminus1 n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

Page 22: 33 GRANGER CAUSALITY AND STRUCTURAL CAUSALITY IN … Lu_Su... · 2017. 2. 7. · 264 XUNLUETAL. natural experiments. The goal of this paper is to establish the analogous equiv-alence

284 XUN LU ET AL

Assumption B3 Suppose that B1 holds with

Yjt = r1(Djt Uj0 bj0

)+r2(Y jtminus1 Djt Zjt Ujt bjt

) j = 1 N t = 1

where r1 and r2 are two unknown ky times1 measurable functions

Assumption B4 Suppose that for given t Dt perp (Ut Utminus1 Bt Btminus1

) |Xt Y tminus1

B4 is a sample-level assumption which can be supported by a correspond-ing assumption at the heterogeneous population level Note that in general B4is a strong assumption as we condition on Y tminus1 which is a function of bothDtminus1 and Utminus1 Nevertheless this assumption is plausible under the null of struc-tural noncausality as Y jtminus1 is a constant function of Djtminus1 under the nullOne simple example is that bjt rsquos are constants over j and t and Djt infint=1 perpYjt 0

t=1minusyUjt infint=1X jt infint=1 Under the null of structural noncausality for

all t Y jtminus1 is a function of Yjs0s=1minusy

Ujsts=1X jst

s=1 Then it is easyto show that in this case B4 is satisfied Certainly it will be interesting to relaxB4 but we leave this for future research

PROPOSITION 43 Suppose that B1ndashB4 hold Suppose that Dgt rArrS Ygtand Dgtminus1 rArrS Ygtminus1 forallg isin Gt cupGtminus1 Then Dt perp Yt | Xt Y tminus1

Proposition 43 suggests that when there are time-invariant components in theunobservables we can test for structural noncausality by testing for Gminusnon-causality Dt perp Yt | Xt Y tminus1 under the conditional exogeneity assumption B4

42 Testing for Gminuscausality and structural causality in panel dataAs shown above we can perform a Gminusnon causality test to test for structuralnoncausality Testing Gminusnon-causality in panel data is also simply a conditionalindependence test Here we focus on testing Dt perp Yt | Xt Y tminus1

First assume that the joint distributions of (Dt Yt Xt ) are identical over t In this case we can pool the data and implement a conditional independence testas discussed in Section 36

Second suppose that the joint distributions of (Dt Yt Xt ) are different overtime t In this case for each time period t using the cross-section data we canimplement a conditional independence test for Dt perp Yt | (Xt Y tminus1) and obtain atest statistic sayWt Thus we have T test statistics W1W2 WT and eachone can be used to test for structural noncausality for each t We may also want totest the hypothesis that for all t Djt does not structurally cause Yjt For this wecan construct our test statistic by taking the average of the T test statistics ieW equiv 1

T

sumTt=1Wt as in Im Pesaran and Shin (2003) and Dumitrescu and Hurlin

(2012) Other forms of ldquoaveragesrdquo are possible and special care is needed to takeinto account the dependence structure over time

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 285

5 CONCLUSION

This paper provides direct links between Granger causality and structural causal-ity in cross-section and panel data We extend Granger causality to cross-section and panel data and give a natural definition of structural causality inheterogeneous populations We show that under the key conditional exogeneityassumption Granger causality is essentially equivalent to structural causality incross-section and panel data Similar to the results in White and Lu (2010) fortime-series data our results here should enable researchers to avoid the misuse ofGranger causality and to establish the desired structural causal relation

NOTES

1 Strictly speaking if the conditional distributions are parameterized or treated as infinite-dimensional parameters then distributional heterogeneity may also be thought of as a special caseof structural heterogeneity

2 The classification is certainly overly simplistic We ignore the early literature on the philosophyof causality (for a review see eg Holland 1986 and Hoover 2008) We also ignore the litera-ture on causality discussed in other disciplines notably in machine learning (see eg Pearl 20092015 White and Chalak 2009 and White Chalak and Lu 2011) In particular Heckman and Pinto(2015) compare Haavelmorsquos structural framework of causality with the framework of Directed AcyclicGraphs (DAG) studied in the Bayesian network

3 Although Uj is typically called ldquounobservablerdquo it is better to view its elements as variables thatwill be omitted from empirical analysis This may be because they are unobservable it may also bebecause the researcher has purposefully or inadvertantly neglected them

4 For notational simplicity we suppress the observation subscript i5 Using our notation Holland considers the case where D perp (XU ) which implies our condi-

tional exogeneity assumption D perp U | X below6 See eg Altonji and Matzkin (2005) Hoderlein and Mammen (2007 2009) Imbens and Newey

(2009) White and Lu (2011) and White and Chalak (2013)7 AT T = E(Y (1)minusY (0) | D = 1) = E(U minus0 | D = 1) = E(U ) = 0

8 The conditional density fg(d | x) can also be interpreted as Imbensrsquos (2000) generalized propen-sity score

9 There is a recent literature which allows time-invariant unobservables to be arbitrarily correlatedwith causes of interest in the general nonseparable models (see eg Evdokimov 2010) We leave theproblem of linking G-causality and structural causality in this general setting for future research

REFERENCES

Altonji J amp R Matzkin (2005) Cross section and panel data estimators for nonseparable models withendogenous regressors Econometrica 73 1053ndash1102

Angrist J K Graddy amp G Imbens (2000) The interpretation of instrumental variables estimatorsin simultaneous equations models with an application to the demand for fish Review of EconomicStudies 67 499ndash527

Angrist J O Jorda amp G Kuersteiner (2013) Semiparametric estimates of monetary policy effectsstring theory revisited NBER Working Paper 19355

Angrist J amp G Kuersteiner (2004) Semiparametric causality tests using the policy propensity scoreNBER Working Paper No 10975

Angrist J amp G Kuersteiner (2011) Causal effects of monetary shocks semiparametric conditionalindependence tests with a multinomial propensity score Review of Economics and Statistics 93725ndash747

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

286 XUN LU ET AL

Angrist J amp J Pischke (2009) Mostly Harmless Econometrics An Empiricistrsquos Companion PrincetonUniversity Press

Bester CA amp CB Hansen (2016) Grouped effects estimators in fixed effects models Journal ofEconometrics 190 197ndash208

Bonhomme S amp E Manresa (2015) Grouped patterns of heterogeneity in panel data Econometrica83 1147ndash1184

Browning M amp JM Carro (2007) Heterogeneity and microeconometrics modelling In R BlundellW K Newey and T Persson (Eds) Advances in Economics and Econometrics Theory andApplications Ninth World Congress of the Econometric Society Volume 3 pp 45ndash74 CambridgeUniversity Press

Browning M amp JM Carro (2010) Heterogeneity in dynamic discrete choice models EconometricsJournal 13 1ndash39

Burda M M Harding amp J Hausman (2015) A Bayesian semi-parametric competing risk model withunobserved heterogeneity Journal of Applied Econometrics 30 353ndash376

Chalak K amp H White (2011) An extended class of instrumental variables for the estimation of causaleffects Canadian Journal of Economics 44 1ndash51

Chalak K amp H White (2012) Causality conditional independence and graphical separation insettable systems Neural Computation 24 1611ndash1668

Chamberlain G (1982) The general equivalence of Granger and Sims causality Econometrica 50569ndash581

Chamberlain G (1984) Panel data In Z Griliches amp M Intriligator (eds) Handbook of Economet-rics vol2 ch 22 pp1247ndash1318 Elsevier

Chesher A (2003) Identification in nonseparable models Econometrica 71 1405ndash1441Chesher A (2005) Nonparametric identification under discrete variation Econometrica 73

1525ndash1550Dawid AP (1979) Conditional independence in statistical theory Journal of the Royal Statistical

Society Series B 41 1ndash31Deb P amp PK Trivedi (2013) Finite mixture for panels with fixed effects Journal of Econometric

Methods 2 35ndash51Delgado M amp W Gonzalez-Manteiga (2001) Significance testing in nonparametric regression based

on the bootstrap Annals of Statistics 29 1469ndash1507Dumitrescu E-I amp C Hurlin (2012) Testing for Granger non-causality in heterogeneous panels

Economic Modelling 29 1450ndash1460Evdokimov K (2010) Identification and estimation of a nonparametric panel data model with unob-

served heterogeneity Working paper Department of Economics Princeton UniversityFlorens JP (2003) Some technical issues in defining causality Journal of Econometrics 112

127ndash128Florens JP amp D Fougere (1996) Non-causality in continuous time Econometrica 64 1195ndash1212Florens JP amp M Mouchart (1982) A note on non-causality Econometrica 50 583ndash591Granger CWJ (1969) Investigating causal relations by econometric models and cross-spectral

methods Econometrica 37 424ndash438Granger CWJ (1980) Testing for causality a personal viewpoint Journal of Economic Dynamics

and Control 2 329ndash352Granger CWJ (1988) Some recent developments in a concept of causality Journal of Econometrics

39 199ndash211Granger CWJ amp P Newbold (1986) Forecasting Economic Time Series Academic Press 2nd

editionHausman J amp T Woutersen (2014) Estimating a semi-parametric duration model without specifying

heterogeneity Journal of Econometrics 178 114ndash131Haavelmo T (1943) The statistical implications of a system of simultaneous equations Econometrica

11 1ndash12Haavelmo T (1944) The probability approach in econometrics Econometrica 12 (supp) iii-vi

1ndash115

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 287

Hahn J (1998) On the role of the propensity score in efficient semiparametric estimation of averagetreatment effect Econometrica 66 315ndash331

Hahn J amp R Moon (2010) Panel data models with finite number of multiple equilibria EconometricTheory 26 863ndash881

Heckman JJ (2000) Causal parameters and policy analysis in economics a twentieth century retro-spective Quarterly Journal of Economics 115 45ndash97

Heckman JJ (2008) Econometric causality International Statistical Review 76 1ndash27Heckman JJ amp R Pinto (2015) Causal analysis after Haavelmo Econometric Theory 31

115ndash151Hirano K G Imbens amp G Ridder (2003) Efficient estimation of average treatment effects using the

estimated propensity score Econometrica 71 1161ndash1189Hoderlein S amp E Mammen (2007) Identification of marginal effects in nonseparable models without

monotonicity Econometrica 75 1513ndash1519Hoderlein S amp E Mammen (2009) Identification and estimation of local average derivatives in non-

separable models without monotonicity Econometrics Journal 12 1ndash25Hoderlein S amp H White (2012) Nonparametric identification in nonseparable panel data models with

generalized fixed effects Journal of Econometrics 168 300ndash314Holland P (1986) Statistics and causal inference Journal of the American Statistical Association 81

945ndash970Holtz-Eakin D W Newey amp HS Rosen (1988) Estimating vector autoregressions with panel data

Econometrica 56 1371ndash1396Hoover K (2008) Causality in economics and econometrics In S Durlauf and L Blume (eds)

The New Palgrave Dictionary of Economics 2nd edition Palgrave MacmillanHsiao C (2014) Analysis of Panel Data Cambridge University PressHuang M Y Sun amp H White (2016) A flexible nonparametric test for conditional independence

Econometric Theory forthcomingHurwicz L (1950) Generalization of the concept of identification In TC Koopmans (eds) Statistical

Inference in Dynamic Economic Models pp 238ndash257 John WileyIm K H Pesaran amp Y Shin (2003) Testing for unit roots in heterogeneous panels Journal of Econo-

metrics 115 53ndash74Imbens G (2000) The role of the propensity score in estimating dose-response functions Biometrika

87 706ndash710Imbens G (2004) Nonparametric estimation of average treatment effects under exogeneity a review

Review of Economics and Statistics 86 4ndash29Imbens G amp W Newey (2009) Identification and estimation of triangular simultaneous equations

models without additivity Econometrica 77 1481ndash1512Imbens G amp J Wooldridge (2009) Recent developments in the econometrics of program evaluation

Journal of Economic Literature 47 5ndash86Kasy M (2011) Identification in triangular systems using control functions Econometric Theory 27

663ndash671Koopmans T (1950) Statistical Inference in Dynamic Economic Models Cowles Commission Mono-

graph No 10 WileyKuersteiner G (2008) GrangerndashSims causality In S Durlauf and L Blume (Eds) The New Palgrave

Dictionary of Economics 2nd edition Palgrave MacmillanLechner M (2001) Identification and estimation of causal effect of multiple treatments under con-

ditional independence assumption In M Lechner F Pfeiller (eds) Econometric Evaluations ofLabour Market Policies Hildelberg Physical 43ndash58

Lechner M (2011) The relation of different concepts of causality used in time series and microecono-metrics Econometric Reviews 30 109ndash127

Lin C-C amp S Ng (2012) Estimation of panel data models with parameter heterogeneity when groupmembership is unknown Journal of Econometric Methods 1 42ndash55

Lindsay B (1995) Mixture Models Theory Geometry and Applications Hayward Institute for Math-ematical Statistics

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

288 XUN LU ET AL

Linton O amp P Gozalo (2014) Testing conditional independence restrictions Econometric Reviews33 523ndash552

Lu X amp L Su (2014) Determining the number of groups in latent panel structures Working paperSchool of Economics Singapore Management University

Matzkin R (2003) Nonparametric estimation of nonadditive random functions Econometrica 711339ndash1375

Matzkin R (2007) Nonparametric identification In JJ Heckman and EE Leamer (eds) The Hand-book of Econometrics 6B North-Holland

Mclachlan G amp K Basford (1988) Mixture Models Inference and Applications to Clustering MarcelDekker

Nair-Reichert U amp D Weinhold (2001) Causality tests for cross-country panels a new look at FDIand economic growth in developing countries Oxford Bulletin of Economics and Statistics 63153-171

Pearl J (2009) Causality Models Reasoning and Inference (2nd edition) Cambridge UniversityPress

Pearl J (2015) Trygve Haavelmo and the emergence of causal calculus Econometric Theory 31152ndash179

Rosenbaum P amp D Rubin (1983) The central role of the propensity score in observational studies forcausal effects Biometrika 70 41ndash55

Rubin D (1974) Estimating causal effects of treatments in randomized and non-randomized studiesJournal of Educational Psychology 66 688ndash701

Rubin D (2004) Direct and indirect causal effects via potential outcomes Scandinavian Journal ofStatistics 31 161ndash170

Sarafidis V amp N Weber (2015) A partially heterogeneous framework for analyzing panel data OxfordBulletin of Economics and Statistics 77 274ndash296

Sims C (1972) Money income and causality American Economic Review 62 540ndash552Song K (2009) Testing conditional independence via Rosenblatt transforms Annals of Statistics 37

4011ndash4015Strotz R amp H Wold (1960) Recursive vs nonrecursive systems An attempt at synthesis Economet-

rica 28 417ndash427Su L Z Shi amp PCB Phillips (2014) Identifying latent structures in panel data Working paper

Department of Economics Yale UniversitySu L amp H White (2007) A Consistent characteristic function-based test for conditional indepen-

dence Journal of Econometrics 141 807ndash834Su L amp H White (2008) A nonparametric Hellinger metric test for conditional independence Econo-

metric Theory 24 829ndash864Su L amp H White (2014) Testing conditional independence via empirical likelihood Journal of

Econometrics 182 27ndash44Sun Y (2005) Estimation and inference in panel structure models Working paper Department of

Economics UCSDWhite H amp K Chalak (2009) Settable systems An extension of Pearlrsquos causal model with optimiza-

tion equilibrium and learning Journal of Machine Learning Research 10 1759ndash1799White H amp K Chalak (2010) Testing a conditional form of exogeneity Economics Letters 109

88ndash90White H amp K Chalak (2013) Identification and identification failure for treatment effects using

structural systems Econometric Reviews 32 273ndash317White H K Chalak amp X Lu (2011) Linking Granger causality and the Pearl causal model with

settable systems Journal of Machine Learning Research Workshop and Conference Proceedings12 1ndash29

White H amp P Kennedy (2009) Retrospective estimation of causal effects through time In J Castleamp N Shephard (eds) The Methodology and Practice of Econometrics A Festschrift in Honour ofDavid F Hendry pp 59ndash87 Oxford University Press

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 289

White H amp X Lu (2010) Granger causality and dynamic structural systems Journal of FinancialEconometrics 8 193ndash243

White H amp X Lu (2011) Causal diagrams for treatment effect estimation with application to selectionof efficient covariates Review of Economics and Statistics 93 1453ndash1459

Wooldridge J (2005) Simple solutions to the initial conditions problem for dynamic nonlinear paneldata models with unobserved heterogeneity Journal of Applied Econometrics 20 39ndash54

Zellner A (1979) Causality and econometrics In K Brunner and AH Meltzer (eds) Three Aspectsof Policy and Policymaking Carnegie-Rochester Conference Series Vol 10 North-Holland 9ndash54

APPENDIX

Proof of Theorem 31 To show D perp U | X we establish that f (u | d x) = f (u | x)for all (ud x) We have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

= f (u x)

f (x)= f (u | x)

where the first equality in the second line holds since A2(i) (ie Dg | Xg sim D | X)implies fg(d x) = f (d | x) fg(x) say and the second holds by A2(i i) fg(u | d x) =fg(u | x) n

Proof of Theorem 32 Note that

f (u | x) = f (u x)

f (x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

We also have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | d x) fg(x)sum

gisinG pg fg(x)

The first equality in the second line follows from A2(i) Then f (u | d x) = f (u | x) forall (ud x) if and only ifsumgisinG

pg fg(u | d x) fg(x) =sumgisinG

pg fg(u | x) fg(x) for all (ud x)

By assumption (Ug | Dg Xg) g isinG is regular It follows immediately that for all g isinGDg perp Ug | Xg n

Proof of Proposition 33 Take any g isin G and let j belong to group Jg Dg rArrSYg means that Yj = r(Zj Uj bj ) Thus Uj perp Dj | X j implies that Yj perp Dj | X j byDawid (1979 Lemmas 41 and 42) This holds for all g isin G Applying Theorem 31gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

290 XUN LU ET AL

Proof of Theorem 34 We prove this by showing that Y perp D | X implies that Dg rArrSYg for all g isinG Given the assumed regularity Theorem 32 ensures that Y perp D | X impliesthat Yg perp Dg | Xg for all g isin G

Given (i) eq(33) and A2(ii) imply that for all j in group JgE

(Yj |X j = x Dj = d

) = r1(d z bj

)+ E[r2

(zUj bj

) | Dj = d X j = x]

= r1(d z bj

)+ E[r2

(zUj bj

) | X j = x]

Clearly r1(d z bj

)is constant in d if and only if E

(Yj |Dj = d X j = x

)is constant in

d Yj perp Dj | X j implies that E(Yj |Dj = d X j = x

)is constant in d which then implies

that r1(d z bj

)is constant in d Thus Dj rArrS Yj so Dg rArrS Yg

Given (i i) let ky = 1 without loss of generality Then by A2(ii) for all j in group JgFUj (u | x) = E

[1Uj le u

∣∣ X j = x]

= E[

1r(d zUj bj

) le r(d zu bj

)∣∣ X j = x]

= E[

1r(Dj Zj Uj bj

) le r(d zu bj

)∣∣ Dj = d X j = x]

= E[

1Yj le r

(d zu bj

)∣∣ Dj = d X j = x]

= Fj(r(d zu bj

) | d x)

By strict monotonicity of Fj (y | d x) in y

r(d zu bj

) = Fminus1j

(FUj (u | x) | d x

)

Now Yj perp Dj | X j implies that Fminus1j

(FUj (u | x) | d x

)is constant in d so r

(d zu bj

)is

also a constant function in d ie Dj rArrS Yj so Dg rArrS Yg As either (i) or (i i) holds for each g isin G we have Dg rArrS Yg for all g isin G n

Proof of Theorem 35 To show that A1 - A2 and the assumed regularity for Ygtogether with Dg rArrS(Xg) Yg for some g isin G imply D perp Y | X we let j in group Jgg isin G be such that Dj rArrS(X j ) Yj A2(i i) ensures that Dj perp Uj | X j so

P[Yj le y | Dj = d X j = x] =int

1r(d zu bj ) le y

d Fj (u | d x)

=int

1r(d zu bj ) le y

d Fj (u | x)

By assumption Dj rArrS(X j ) Yj so there exists ylowast isin supp(Yj ) such that there is no mea-

surable mapping f jylowast for whichint

1r(d zu bj ) le ylowastd Fj (u | d x) = f jylowast(x) ae-x Specifically this rules out the possibility thatint

1r(d zu bj ) le ylowast

d Fj (u | X j ) = f jylowast(X j ) equiv P[Yj le ylowast | X j ] as

Thus there exists ylowast isin supp(Yj ) such that

P[Yj le ylowast | Dj = d X j = x

] = P[Yj le ylowast | X j = x

]

for x in a set of positive probability so Yj perp Dj | X j Given the regularity assumed for YgTheorem 32 (for Yg) gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 291

Proof of Theorem 41 The proof is analogous to that of Theorem 31 n

Proof of Proposition 42 The proof is analogous to that of Proposition 33 n

Proof of Proposition 43 Under B3 Dgt rArrS Ygt and Dgtminus1 rArrS Ygtminus1 forallg isinGt cupGtminus1 imply that for j isin Jg there exist two measurable functions r1 and r2 such that

Yjt = r1(Uj0 bj0)+ r2(Y jtminus1 Zjt Ujt bjt

) j = 1 N t = 1

Hence

Yjt minusYjtminus1 = r2(Y jtminus1 Zjt Ujt bjt

)minus r2(Y jtminus2 Zjtminus1Ujtminus1 bjtminus1

)

At the sample level this means that

Yt minusYtminus1 = r2(Y tminus1 Zt Ut Bt

)minus r2(Y tminus2 Ztminus1Utminus1 Btminus1

)

Then B4 implies that

Dt perp (Xt Y tminus1Ut Utminus1 Bt Btminus1) | Xt Y tminus1

which further implies that Dt perp (Yt minusYtminus1) | Xt Y tminus1by Dawid (1979 Lemmas 41 and42) as (Yt minus Ytminus1) is a function of (Xt Y tminus1Ut Utminus1 Bt Btminus1) Then we have Dt perpYt | Xt Y tminus1 as Ytminus1 is a component of Y tminus1 n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

Page 23: 33 GRANGER CAUSALITY AND STRUCTURAL CAUSALITY IN … Lu_Su... · 2017. 2. 7. · 264 XUNLUETAL. natural experiments. The goal of this paper is to establish the analogous equiv-alence

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 285

5 CONCLUSION

This paper provides direct links between Granger causality and structural causal-ity in cross-section and panel data We extend Granger causality to cross-section and panel data and give a natural definition of structural causality inheterogeneous populations We show that under the key conditional exogeneityassumption Granger causality is essentially equivalent to structural causality incross-section and panel data Similar to the results in White and Lu (2010) fortime-series data our results here should enable researchers to avoid the misuse ofGranger causality and to establish the desired structural causal relation

NOTES

1 Strictly speaking if the conditional distributions are parameterized or treated as infinite-dimensional parameters then distributional heterogeneity may also be thought of as a special caseof structural heterogeneity

2 The classification is certainly overly simplistic We ignore the early literature on the philosophyof causality (for a review see eg Holland 1986 and Hoover 2008) We also ignore the litera-ture on causality discussed in other disciplines notably in machine learning (see eg Pearl 20092015 White and Chalak 2009 and White Chalak and Lu 2011) In particular Heckman and Pinto(2015) compare Haavelmorsquos structural framework of causality with the framework of Directed AcyclicGraphs (DAG) studied in the Bayesian network

3 Although Uj is typically called ldquounobservablerdquo it is better to view its elements as variables thatwill be omitted from empirical analysis This may be because they are unobservable it may also bebecause the researcher has purposefully or inadvertantly neglected them

4 For notational simplicity we suppress the observation subscript i5 Using our notation Holland considers the case where D perp (XU ) which implies our condi-

tional exogeneity assumption D perp U | X below6 See eg Altonji and Matzkin (2005) Hoderlein and Mammen (2007 2009) Imbens and Newey

(2009) White and Lu (2011) and White and Chalak (2013)7 AT T = E(Y (1)minusY (0) | D = 1) = E(U minus0 | D = 1) = E(U ) = 0

8 The conditional density fg(d | x) can also be interpreted as Imbensrsquos (2000) generalized propen-sity score

9 There is a recent literature which allows time-invariant unobservables to be arbitrarily correlatedwith causes of interest in the general nonseparable models (see eg Evdokimov 2010) We leave theproblem of linking G-causality and structural causality in this general setting for future research

REFERENCES

Altonji J amp R Matzkin (2005) Cross section and panel data estimators for nonseparable models withendogenous regressors Econometrica 73 1053ndash1102

Angrist J K Graddy amp G Imbens (2000) The interpretation of instrumental variables estimatorsin simultaneous equations models with an application to the demand for fish Review of EconomicStudies 67 499ndash527

Angrist J O Jorda amp G Kuersteiner (2013) Semiparametric estimates of monetary policy effectsstring theory revisited NBER Working Paper 19355

Angrist J amp G Kuersteiner (2004) Semiparametric causality tests using the policy propensity scoreNBER Working Paper No 10975

Angrist J amp G Kuersteiner (2011) Causal effects of monetary shocks semiparametric conditionalindependence tests with a multinomial propensity score Review of Economics and Statistics 93725ndash747

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

286 XUN LU ET AL

Angrist J amp J Pischke (2009) Mostly Harmless Econometrics An Empiricistrsquos Companion PrincetonUniversity Press

Bester CA amp CB Hansen (2016) Grouped effects estimators in fixed effects models Journal ofEconometrics 190 197ndash208

Bonhomme S amp E Manresa (2015) Grouped patterns of heterogeneity in panel data Econometrica83 1147ndash1184

Browning M amp JM Carro (2007) Heterogeneity and microeconometrics modelling In R BlundellW K Newey and T Persson (Eds) Advances in Economics and Econometrics Theory andApplications Ninth World Congress of the Econometric Society Volume 3 pp 45ndash74 CambridgeUniversity Press

Browning M amp JM Carro (2010) Heterogeneity in dynamic discrete choice models EconometricsJournal 13 1ndash39

Burda M M Harding amp J Hausman (2015) A Bayesian semi-parametric competing risk model withunobserved heterogeneity Journal of Applied Econometrics 30 353ndash376

Chalak K amp H White (2011) An extended class of instrumental variables for the estimation of causaleffects Canadian Journal of Economics 44 1ndash51

Chalak K amp H White (2012) Causality conditional independence and graphical separation insettable systems Neural Computation 24 1611ndash1668

Chamberlain G (1982) The general equivalence of Granger and Sims causality Econometrica 50569ndash581

Chamberlain G (1984) Panel data In Z Griliches amp M Intriligator (eds) Handbook of Economet-rics vol2 ch 22 pp1247ndash1318 Elsevier

Chesher A (2003) Identification in nonseparable models Econometrica 71 1405ndash1441Chesher A (2005) Nonparametric identification under discrete variation Econometrica 73

1525ndash1550Dawid AP (1979) Conditional independence in statistical theory Journal of the Royal Statistical

Society Series B 41 1ndash31Deb P amp PK Trivedi (2013) Finite mixture for panels with fixed effects Journal of Econometric

Methods 2 35ndash51Delgado M amp W Gonzalez-Manteiga (2001) Significance testing in nonparametric regression based

on the bootstrap Annals of Statistics 29 1469ndash1507Dumitrescu E-I amp C Hurlin (2012) Testing for Granger non-causality in heterogeneous panels

Economic Modelling 29 1450ndash1460Evdokimov K (2010) Identification and estimation of a nonparametric panel data model with unob-

served heterogeneity Working paper Department of Economics Princeton UniversityFlorens JP (2003) Some technical issues in defining causality Journal of Econometrics 112

127ndash128Florens JP amp D Fougere (1996) Non-causality in continuous time Econometrica 64 1195ndash1212Florens JP amp M Mouchart (1982) A note on non-causality Econometrica 50 583ndash591Granger CWJ (1969) Investigating causal relations by econometric models and cross-spectral

methods Econometrica 37 424ndash438Granger CWJ (1980) Testing for causality a personal viewpoint Journal of Economic Dynamics

and Control 2 329ndash352Granger CWJ (1988) Some recent developments in a concept of causality Journal of Econometrics

39 199ndash211Granger CWJ amp P Newbold (1986) Forecasting Economic Time Series Academic Press 2nd

editionHausman J amp T Woutersen (2014) Estimating a semi-parametric duration model without specifying

heterogeneity Journal of Econometrics 178 114ndash131Haavelmo T (1943) The statistical implications of a system of simultaneous equations Econometrica

11 1ndash12Haavelmo T (1944) The probability approach in econometrics Econometrica 12 (supp) iii-vi

1ndash115

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 287

Hahn J (1998) On the role of the propensity score in efficient semiparametric estimation of averagetreatment effect Econometrica 66 315ndash331

Hahn J amp R Moon (2010) Panel data models with finite number of multiple equilibria EconometricTheory 26 863ndash881

Heckman JJ (2000) Causal parameters and policy analysis in economics a twentieth century retro-spective Quarterly Journal of Economics 115 45ndash97

Heckman JJ (2008) Econometric causality International Statistical Review 76 1ndash27Heckman JJ amp R Pinto (2015) Causal analysis after Haavelmo Econometric Theory 31

115ndash151Hirano K G Imbens amp G Ridder (2003) Efficient estimation of average treatment effects using the

estimated propensity score Econometrica 71 1161ndash1189Hoderlein S amp E Mammen (2007) Identification of marginal effects in nonseparable models without

monotonicity Econometrica 75 1513ndash1519Hoderlein S amp E Mammen (2009) Identification and estimation of local average derivatives in non-

separable models without monotonicity Econometrics Journal 12 1ndash25Hoderlein S amp H White (2012) Nonparametric identification in nonseparable panel data models with

generalized fixed effects Journal of Econometrics 168 300ndash314Holland P (1986) Statistics and causal inference Journal of the American Statistical Association 81

945ndash970Holtz-Eakin D W Newey amp HS Rosen (1988) Estimating vector autoregressions with panel data

Econometrica 56 1371ndash1396Hoover K (2008) Causality in economics and econometrics In S Durlauf and L Blume (eds)

The New Palgrave Dictionary of Economics 2nd edition Palgrave MacmillanHsiao C (2014) Analysis of Panel Data Cambridge University PressHuang M Y Sun amp H White (2016) A flexible nonparametric test for conditional independence

Econometric Theory forthcomingHurwicz L (1950) Generalization of the concept of identification In TC Koopmans (eds) Statistical

Inference in Dynamic Economic Models pp 238ndash257 John WileyIm K H Pesaran amp Y Shin (2003) Testing for unit roots in heterogeneous panels Journal of Econo-

metrics 115 53ndash74Imbens G (2000) The role of the propensity score in estimating dose-response functions Biometrika

87 706ndash710Imbens G (2004) Nonparametric estimation of average treatment effects under exogeneity a review

Review of Economics and Statistics 86 4ndash29Imbens G amp W Newey (2009) Identification and estimation of triangular simultaneous equations

models without additivity Econometrica 77 1481ndash1512Imbens G amp J Wooldridge (2009) Recent developments in the econometrics of program evaluation

Journal of Economic Literature 47 5ndash86Kasy M (2011) Identification in triangular systems using control functions Econometric Theory 27

663ndash671Koopmans T (1950) Statistical Inference in Dynamic Economic Models Cowles Commission Mono-

graph No 10 WileyKuersteiner G (2008) GrangerndashSims causality In S Durlauf and L Blume (Eds) The New Palgrave

Dictionary of Economics 2nd edition Palgrave MacmillanLechner M (2001) Identification and estimation of causal effect of multiple treatments under con-

ditional independence assumption In M Lechner F Pfeiller (eds) Econometric Evaluations ofLabour Market Policies Hildelberg Physical 43ndash58

Lechner M (2011) The relation of different concepts of causality used in time series and microecono-metrics Econometric Reviews 30 109ndash127

Lin C-C amp S Ng (2012) Estimation of panel data models with parameter heterogeneity when groupmembership is unknown Journal of Econometric Methods 1 42ndash55

Lindsay B (1995) Mixture Models Theory Geometry and Applications Hayward Institute for Math-ematical Statistics

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

288 XUN LU ET AL

Linton O amp P Gozalo (2014) Testing conditional independence restrictions Econometric Reviews33 523ndash552

Lu X amp L Su (2014) Determining the number of groups in latent panel structures Working paperSchool of Economics Singapore Management University

Matzkin R (2003) Nonparametric estimation of nonadditive random functions Econometrica 711339ndash1375

Matzkin R (2007) Nonparametric identification In JJ Heckman and EE Leamer (eds) The Hand-book of Econometrics 6B North-Holland

Mclachlan G amp K Basford (1988) Mixture Models Inference and Applications to Clustering MarcelDekker

Nair-Reichert U amp D Weinhold (2001) Causality tests for cross-country panels a new look at FDIand economic growth in developing countries Oxford Bulletin of Economics and Statistics 63153-171

Pearl J (2009) Causality Models Reasoning and Inference (2nd edition) Cambridge UniversityPress

Pearl J (2015) Trygve Haavelmo and the emergence of causal calculus Econometric Theory 31152ndash179

Rosenbaum P amp D Rubin (1983) The central role of the propensity score in observational studies forcausal effects Biometrika 70 41ndash55

Rubin D (1974) Estimating causal effects of treatments in randomized and non-randomized studiesJournal of Educational Psychology 66 688ndash701

Rubin D (2004) Direct and indirect causal effects via potential outcomes Scandinavian Journal ofStatistics 31 161ndash170

Sarafidis V amp N Weber (2015) A partially heterogeneous framework for analyzing panel data OxfordBulletin of Economics and Statistics 77 274ndash296

Sims C (1972) Money income and causality American Economic Review 62 540ndash552Song K (2009) Testing conditional independence via Rosenblatt transforms Annals of Statistics 37

4011ndash4015Strotz R amp H Wold (1960) Recursive vs nonrecursive systems An attempt at synthesis Economet-

rica 28 417ndash427Su L Z Shi amp PCB Phillips (2014) Identifying latent structures in panel data Working paper

Department of Economics Yale UniversitySu L amp H White (2007) A Consistent characteristic function-based test for conditional indepen-

dence Journal of Econometrics 141 807ndash834Su L amp H White (2008) A nonparametric Hellinger metric test for conditional independence Econo-

metric Theory 24 829ndash864Su L amp H White (2014) Testing conditional independence via empirical likelihood Journal of

Econometrics 182 27ndash44Sun Y (2005) Estimation and inference in panel structure models Working paper Department of

Economics UCSDWhite H amp K Chalak (2009) Settable systems An extension of Pearlrsquos causal model with optimiza-

tion equilibrium and learning Journal of Machine Learning Research 10 1759ndash1799White H amp K Chalak (2010) Testing a conditional form of exogeneity Economics Letters 109

88ndash90White H amp K Chalak (2013) Identification and identification failure for treatment effects using

structural systems Econometric Reviews 32 273ndash317White H K Chalak amp X Lu (2011) Linking Granger causality and the Pearl causal model with

settable systems Journal of Machine Learning Research Workshop and Conference Proceedings12 1ndash29

White H amp P Kennedy (2009) Retrospective estimation of causal effects through time In J Castleamp N Shephard (eds) The Methodology and Practice of Econometrics A Festschrift in Honour ofDavid F Hendry pp 59ndash87 Oxford University Press

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 289

White H amp X Lu (2010) Granger causality and dynamic structural systems Journal of FinancialEconometrics 8 193ndash243

White H amp X Lu (2011) Causal diagrams for treatment effect estimation with application to selectionof efficient covariates Review of Economics and Statistics 93 1453ndash1459

Wooldridge J (2005) Simple solutions to the initial conditions problem for dynamic nonlinear paneldata models with unobserved heterogeneity Journal of Applied Econometrics 20 39ndash54

Zellner A (1979) Causality and econometrics In K Brunner and AH Meltzer (eds) Three Aspectsof Policy and Policymaking Carnegie-Rochester Conference Series Vol 10 North-Holland 9ndash54

APPENDIX

Proof of Theorem 31 To show D perp U | X we establish that f (u | d x) = f (u | x)for all (ud x) We have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

= f (u x)

f (x)= f (u | x)

where the first equality in the second line holds since A2(i) (ie Dg | Xg sim D | X)implies fg(d x) = f (d | x) fg(x) say and the second holds by A2(i i) fg(u | d x) =fg(u | x) n

Proof of Theorem 32 Note that

f (u | x) = f (u x)

f (x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

We also have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | d x) fg(x)sum

gisinG pg fg(x)

The first equality in the second line follows from A2(i) Then f (u | d x) = f (u | x) forall (ud x) if and only ifsumgisinG

pg fg(u | d x) fg(x) =sumgisinG

pg fg(u | x) fg(x) for all (ud x)

By assumption (Ug | Dg Xg) g isinG is regular It follows immediately that for all g isinGDg perp Ug | Xg n

Proof of Proposition 33 Take any g isin G and let j belong to group Jg Dg rArrSYg means that Yj = r(Zj Uj bj ) Thus Uj perp Dj | X j implies that Yj perp Dj | X j byDawid (1979 Lemmas 41 and 42) This holds for all g isin G Applying Theorem 31gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

290 XUN LU ET AL

Proof of Theorem 34 We prove this by showing that Y perp D | X implies that Dg rArrSYg for all g isinG Given the assumed regularity Theorem 32 ensures that Y perp D | X impliesthat Yg perp Dg | Xg for all g isin G

Given (i) eq(33) and A2(ii) imply that for all j in group JgE

(Yj |X j = x Dj = d

) = r1(d z bj

)+ E[r2

(zUj bj

) | Dj = d X j = x]

= r1(d z bj

)+ E[r2

(zUj bj

) | X j = x]

Clearly r1(d z bj

)is constant in d if and only if E

(Yj |Dj = d X j = x

)is constant in

d Yj perp Dj | X j implies that E(Yj |Dj = d X j = x

)is constant in d which then implies

that r1(d z bj

)is constant in d Thus Dj rArrS Yj so Dg rArrS Yg

Given (i i) let ky = 1 without loss of generality Then by A2(ii) for all j in group JgFUj (u | x) = E

[1Uj le u

∣∣ X j = x]

= E[

1r(d zUj bj

) le r(d zu bj

)∣∣ X j = x]

= E[

1r(Dj Zj Uj bj

) le r(d zu bj

)∣∣ Dj = d X j = x]

= E[

1Yj le r

(d zu bj

)∣∣ Dj = d X j = x]

= Fj(r(d zu bj

) | d x)

By strict monotonicity of Fj (y | d x) in y

r(d zu bj

) = Fminus1j

(FUj (u | x) | d x

)

Now Yj perp Dj | X j implies that Fminus1j

(FUj (u | x) | d x

)is constant in d so r

(d zu bj

)is

also a constant function in d ie Dj rArrS Yj so Dg rArrS Yg As either (i) or (i i) holds for each g isin G we have Dg rArrS Yg for all g isin G n

Proof of Theorem 35 To show that A1 - A2 and the assumed regularity for Ygtogether with Dg rArrS(Xg) Yg for some g isin G imply D perp Y | X we let j in group Jgg isin G be such that Dj rArrS(X j ) Yj A2(i i) ensures that Dj perp Uj | X j so

P[Yj le y | Dj = d X j = x] =int

1r(d zu bj ) le y

d Fj (u | d x)

=int

1r(d zu bj ) le y

d Fj (u | x)

By assumption Dj rArrS(X j ) Yj so there exists ylowast isin supp(Yj ) such that there is no mea-

surable mapping f jylowast for whichint

1r(d zu bj ) le ylowastd Fj (u | d x) = f jylowast(x) ae-x Specifically this rules out the possibility thatint

1r(d zu bj ) le ylowast

d Fj (u | X j ) = f jylowast(X j ) equiv P[Yj le ylowast | X j ] as

Thus there exists ylowast isin supp(Yj ) such that

P[Yj le ylowast | Dj = d X j = x

] = P[Yj le ylowast | X j = x

]

for x in a set of positive probability so Yj perp Dj | X j Given the regularity assumed for YgTheorem 32 (for Yg) gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 291

Proof of Theorem 41 The proof is analogous to that of Theorem 31 n

Proof of Proposition 42 The proof is analogous to that of Proposition 33 n

Proof of Proposition 43 Under B3 Dgt rArrS Ygt and Dgtminus1 rArrS Ygtminus1 forallg isinGt cupGtminus1 imply that for j isin Jg there exist two measurable functions r1 and r2 such that

Yjt = r1(Uj0 bj0)+ r2(Y jtminus1 Zjt Ujt bjt

) j = 1 N t = 1

Hence

Yjt minusYjtminus1 = r2(Y jtminus1 Zjt Ujt bjt

)minus r2(Y jtminus2 Zjtminus1Ujtminus1 bjtminus1

)

At the sample level this means that

Yt minusYtminus1 = r2(Y tminus1 Zt Ut Bt

)minus r2(Y tminus2 Ztminus1Utminus1 Btminus1

)

Then B4 implies that

Dt perp (Xt Y tminus1Ut Utminus1 Bt Btminus1) | Xt Y tminus1

which further implies that Dt perp (Yt minusYtminus1) | Xt Y tminus1by Dawid (1979 Lemmas 41 and42) as (Yt minus Ytminus1) is a function of (Xt Y tminus1Ut Utminus1 Bt Btminus1) Then we have Dt perpYt | Xt Y tminus1 as Ytminus1 is a component of Y tminus1 n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

Page 24: 33 GRANGER CAUSALITY AND STRUCTURAL CAUSALITY IN … Lu_Su... · 2017. 2. 7. · 264 XUNLUETAL. natural experiments. The goal of this paper is to establish the analogous equiv-alence

286 XUN LU ET AL

Angrist J amp J Pischke (2009) Mostly Harmless Econometrics An Empiricistrsquos Companion PrincetonUniversity Press

Bester CA amp CB Hansen (2016) Grouped effects estimators in fixed effects models Journal ofEconometrics 190 197ndash208

Bonhomme S amp E Manresa (2015) Grouped patterns of heterogeneity in panel data Econometrica83 1147ndash1184

Browning M amp JM Carro (2007) Heterogeneity and microeconometrics modelling In R BlundellW K Newey and T Persson (Eds) Advances in Economics and Econometrics Theory andApplications Ninth World Congress of the Econometric Society Volume 3 pp 45ndash74 CambridgeUniversity Press

Browning M amp JM Carro (2010) Heterogeneity in dynamic discrete choice models EconometricsJournal 13 1ndash39

Burda M M Harding amp J Hausman (2015) A Bayesian semi-parametric competing risk model withunobserved heterogeneity Journal of Applied Econometrics 30 353ndash376

Chalak K amp H White (2011) An extended class of instrumental variables for the estimation of causaleffects Canadian Journal of Economics 44 1ndash51

Chalak K amp H White (2012) Causality conditional independence and graphical separation insettable systems Neural Computation 24 1611ndash1668

Chamberlain G (1982) The general equivalence of Granger and Sims causality Econometrica 50569ndash581

Chamberlain G (1984) Panel data In Z Griliches amp M Intriligator (eds) Handbook of Economet-rics vol2 ch 22 pp1247ndash1318 Elsevier

Chesher A (2003) Identification in nonseparable models Econometrica 71 1405ndash1441Chesher A (2005) Nonparametric identification under discrete variation Econometrica 73

1525ndash1550Dawid AP (1979) Conditional independence in statistical theory Journal of the Royal Statistical

Society Series B 41 1ndash31Deb P amp PK Trivedi (2013) Finite mixture for panels with fixed effects Journal of Econometric

Methods 2 35ndash51Delgado M amp W Gonzalez-Manteiga (2001) Significance testing in nonparametric regression based

on the bootstrap Annals of Statistics 29 1469ndash1507Dumitrescu E-I amp C Hurlin (2012) Testing for Granger non-causality in heterogeneous panels

Economic Modelling 29 1450ndash1460Evdokimov K (2010) Identification and estimation of a nonparametric panel data model with unob-

served heterogeneity Working paper Department of Economics Princeton UniversityFlorens JP (2003) Some technical issues in defining causality Journal of Econometrics 112

127ndash128Florens JP amp D Fougere (1996) Non-causality in continuous time Econometrica 64 1195ndash1212Florens JP amp M Mouchart (1982) A note on non-causality Econometrica 50 583ndash591Granger CWJ (1969) Investigating causal relations by econometric models and cross-spectral

methods Econometrica 37 424ndash438Granger CWJ (1980) Testing for causality a personal viewpoint Journal of Economic Dynamics

and Control 2 329ndash352Granger CWJ (1988) Some recent developments in a concept of causality Journal of Econometrics

39 199ndash211Granger CWJ amp P Newbold (1986) Forecasting Economic Time Series Academic Press 2nd

editionHausman J amp T Woutersen (2014) Estimating a semi-parametric duration model without specifying

heterogeneity Journal of Econometrics 178 114ndash131Haavelmo T (1943) The statistical implications of a system of simultaneous equations Econometrica

11 1ndash12Haavelmo T (1944) The probability approach in econometrics Econometrica 12 (supp) iii-vi

1ndash115

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 287

Hahn J (1998) On the role of the propensity score in efficient semiparametric estimation of averagetreatment effect Econometrica 66 315ndash331

Hahn J amp R Moon (2010) Panel data models with finite number of multiple equilibria EconometricTheory 26 863ndash881

Heckman JJ (2000) Causal parameters and policy analysis in economics a twentieth century retro-spective Quarterly Journal of Economics 115 45ndash97

Heckman JJ (2008) Econometric causality International Statistical Review 76 1ndash27Heckman JJ amp R Pinto (2015) Causal analysis after Haavelmo Econometric Theory 31

115ndash151Hirano K G Imbens amp G Ridder (2003) Efficient estimation of average treatment effects using the

estimated propensity score Econometrica 71 1161ndash1189Hoderlein S amp E Mammen (2007) Identification of marginal effects in nonseparable models without

monotonicity Econometrica 75 1513ndash1519Hoderlein S amp E Mammen (2009) Identification and estimation of local average derivatives in non-

separable models without monotonicity Econometrics Journal 12 1ndash25Hoderlein S amp H White (2012) Nonparametric identification in nonseparable panel data models with

generalized fixed effects Journal of Econometrics 168 300ndash314Holland P (1986) Statistics and causal inference Journal of the American Statistical Association 81

945ndash970Holtz-Eakin D W Newey amp HS Rosen (1988) Estimating vector autoregressions with panel data

Econometrica 56 1371ndash1396Hoover K (2008) Causality in economics and econometrics In S Durlauf and L Blume (eds)

The New Palgrave Dictionary of Economics 2nd edition Palgrave MacmillanHsiao C (2014) Analysis of Panel Data Cambridge University PressHuang M Y Sun amp H White (2016) A flexible nonparametric test for conditional independence

Econometric Theory forthcomingHurwicz L (1950) Generalization of the concept of identification In TC Koopmans (eds) Statistical

Inference in Dynamic Economic Models pp 238ndash257 John WileyIm K H Pesaran amp Y Shin (2003) Testing for unit roots in heterogeneous panels Journal of Econo-

metrics 115 53ndash74Imbens G (2000) The role of the propensity score in estimating dose-response functions Biometrika

87 706ndash710Imbens G (2004) Nonparametric estimation of average treatment effects under exogeneity a review

Review of Economics and Statistics 86 4ndash29Imbens G amp W Newey (2009) Identification and estimation of triangular simultaneous equations

models without additivity Econometrica 77 1481ndash1512Imbens G amp J Wooldridge (2009) Recent developments in the econometrics of program evaluation

Journal of Economic Literature 47 5ndash86Kasy M (2011) Identification in triangular systems using control functions Econometric Theory 27

663ndash671Koopmans T (1950) Statistical Inference in Dynamic Economic Models Cowles Commission Mono-

graph No 10 WileyKuersteiner G (2008) GrangerndashSims causality In S Durlauf and L Blume (Eds) The New Palgrave

Dictionary of Economics 2nd edition Palgrave MacmillanLechner M (2001) Identification and estimation of causal effect of multiple treatments under con-

ditional independence assumption In M Lechner F Pfeiller (eds) Econometric Evaluations ofLabour Market Policies Hildelberg Physical 43ndash58

Lechner M (2011) The relation of different concepts of causality used in time series and microecono-metrics Econometric Reviews 30 109ndash127

Lin C-C amp S Ng (2012) Estimation of panel data models with parameter heterogeneity when groupmembership is unknown Journal of Econometric Methods 1 42ndash55

Lindsay B (1995) Mixture Models Theory Geometry and Applications Hayward Institute for Math-ematical Statistics

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

288 XUN LU ET AL

Linton O amp P Gozalo (2014) Testing conditional independence restrictions Econometric Reviews33 523ndash552

Lu X amp L Su (2014) Determining the number of groups in latent panel structures Working paperSchool of Economics Singapore Management University

Matzkin R (2003) Nonparametric estimation of nonadditive random functions Econometrica 711339ndash1375

Matzkin R (2007) Nonparametric identification In JJ Heckman and EE Leamer (eds) The Hand-book of Econometrics 6B North-Holland

Mclachlan G amp K Basford (1988) Mixture Models Inference and Applications to Clustering MarcelDekker

Nair-Reichert U amp D Weinhold (2001) Causality tests for cross-country panels a new look at FDIand economic growth in developing countries Oxford Bulletin of Economics and Statistics 63153-171

Pearl J (2009) Causality Models Reasoning and Inference (2nd edition) Cambridge UniversityPress

Pearl J (2015) Trygve Haavelmo and the emergence of causal calculus Econometric Theory 31152ndash179

Rosenbaum P amp D Rubin (1983) The central role of the propensity score in observational studies forcausal effects Biometrika 70 41ndash55

Rubin D (1974) Estimating causal effects of treatments in randomized and non-randomized studiesJournal of Educational Psychology 66 688ndash701

Rubin D (2004) Direct and indirect causal effects via potential outcomes Scandinavian Journal ofStatistics 31 161ndash170

Sarafidis V amp N Weber (2015) A partially heterogeneous framework for analyzing panel data OxfordBulletin of Economics and Statistics 77 274ndash296

Sims C (1972) Money income and causality American Economic Review 62 540ndash552Song K (2009) Testing conditional independence via Rosenblatt transforms Annals of Statistics 37

4011ndash4015Strotz R amp H Wold (1960) Recursive vs nonrecursive systems An attempt at synthesis Economet-

rica 28 417ndash427Su L Z Shi amp PCB Phillips (2014) Identifying latent structures in panel data Working paper

Department of Economics Yale UniversitySu L amp H White (2007) A Consistent characteristic function-based test for conditional indepen-

dence Journal of Econometrics 141 807ndash834Su L amp H White (2008) A nonparametric Hellinger metric test for conditional independence Econo-

metric Theory 24 829ndash864Su L amp H White (2014) Testing conditional independence via empirical likelihood Journal of

Econometrics 182 27ndash44Sun Y (2005) Estimation and inference in panel structure models Working paper Department of

Economics UCSDWhite H amp K Chalak (2009) Settable systems An extension of Pearlrsquos causal model with optimiza-

tion equilibrium and learning Journal of Machine Learning Research 10 1759ndash1799White H amp K Chalak (2010) Testing a conditional form of exogeneity Economics Letters 109

88ndash90White H amp K Chalak (2013) Identification and identification failure for treatment effects using

structural systems Econometric Reviews 32 273ndash317White H K Chalak amp X Lu (2011) Linking Granger causality and the Pearl causal model with

settable systems Journal of Machine Learning Research Workshop and Conference Proceedings12 1ndash29

White H amp P Kennedy (2009) Retrospective estimation of causal effects through time In J Castleamp N Shephard (eds) The Methodology and Practice of Econometrics A Festschrift in Honour ofDavid F Hendry pp 59ndash87 Oxford University Press

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 289

White H amp X Lu (2010) Granger causality and dynamic structural systems Journal of FinancialEconometrics 8 193ndash243

White H amp X Lu (2011) Causal diagrams for treatment effect estimation with application to selectionof efficient covariates Review of Economics and Statistics 93 1453ndash1459

Wooldridge J (2005) Simple solutions to the initial conditions problem for dynamic nonlinear paneldata models with unobserved heterogeneity Journal of Applied Econometrics 20 39ndash54

Zellner A (1979) Causality and econometrics In K Brunner and AH Meltzer (eds) Three Aspectsof Policy and Policymaking Carnegie-Rochester Conference Series Vol 10 North-Holland 9ndash54

APPENDIX

Proof of Theorem 31 To show D perp U | X we establish that f (u | d x) = f (u | x)for all (ud x) We have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

= f (u x)

f (x)= f (u | x)

where the first equality in the second line holds since A2(i) (ie Dg | Xg sim D | X)implies fg(d x) = f (d | x) fg(x) say and the second holds by A2(i i) fg(u | d x) =fg(u | x) n

Proof of Theorem 32 Note that

f (u | x) = f (u x)

f (x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

We also have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | d x) fg(x)sum

gisinG pg fg(x)

The first equality in the second line follows from A2(i) Then f (u | d x) = f (u | x) forall (ud x) if and only ifsumgisinG

pg fg(u | d x) fg(x) =sumgisinG

pg fg(u | x) fg(x) for all (ud x)

By assumption (Ug | Dg Xg) g isinG is regular It follows immediately that for all g isinGDg perp Ug | Xg n

Proof of Proposition 33 Take any g isin G and let j belong to group Jg Dg rArrSYg means that Yj = r(Zj Uj bj ) Thus Uj perp Dj | X j implies that Yj perp Dj | X j byDawid (1979 Lemmas 41 and 42) This holds for all g isin G Applying Theorem 31gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

290 XUN LU ET AL

Proof of Theorem 34 We prove this by showing that Y perp D | X implies that Dg rArrSYg for all g isinG Given the assumed regularity Theorem 32 ensures that Y perp D | X impliesthat Yg perp Dg | Xg for all g isin G

Given (i) eq(33) and A2(ii) imply that for all j in group JgE

(Yj |X j = x Dj = d

) = r1(d z bj

)+ E[r2

(zUj bj

) | Dj = d X j = x]

= r1(d z bj

)+ E[r2

(zUj bj

) | X j = x]

Clearly r1(d z bj

)is constant in d if and only if E

(Yj |Dj = d X j = x

)is constant in

d Yj perp Dj | X j implies that E(Yj |Dj = d X j = x

)is constant in d which then implies

that r1(d z bj

)is constant in d Thus Dj rArrS Yj so Dg rArrS Yg

Given (i i) let ky = 1 without loss of generality Then by A2(ii) for all j in group JgFUj (u | x) = E

[1Uj le u

∣∣ X j = x]

= E[

1r(d zUj bj

) le r(d zu bj

)∣∣ X j = x]

= E[

1r(Dj Zj Uj bj

) le r(d zu bj

)∣∣ Dj = d X j = x]

= E[

1Yj le r

(d zu bj

)∣∣ Dj = d X j = x]

= Fj(r(d zu bj

) | d x)

By strict monotonicity of Fj (y | d x) in y

r(d zu bj

) = Fminus1j

(FUj (u | x) | d x

)

Now Yj perp Dj | X j implies that Fminus1j

(FUj (u | x) | d x

)is constant in d so r

(d zu bj

)is

also a constant function in d ie Dj rArrS Yj so Dg rArrS Yg As either (i) or (i i) holds for each g isin G we have Dg rArrS Yg for all g isin G n

Proof of Theorem 35 To show that A1 - A2 and the assumed regularity for Ygtogether with Dg rArrS(Xg) Yg for some g isin G imply D perp Y | X we let j in group Jgg isin G be such that Dj rArrS(X j ) Yj A2(i i) ensures that Dj perp Uj | X j so

P[Yj le y | Dj = d X j = x] =int

1r(d zu bj ) le y

d Fj (u | d x)

=int

1r(d zu bj ) le y

d Fj (u | x)

By assumption Dj rArrS(X j ) Yj so there exists ylowast isin supp(Yj ) such that there is no mea-

surable mapping f jylowast for whichint

1r(d zu bj ) le ylowastd Fj (u | d x) = f jylowast(x) ae-x Specifically this rules out the possibility thatint

1r(d zu bj ) le ylowast

d Fj (u | X j ) = f jylowast(X j ) equiv P[Yj le ylowast | X j ] as

Thus there exists ylowast isin supp(Yj ) such that

P[Yj le ylowast | Dj = d X j = x

] = P[Yj le ylowast | X j = x

]

for x in a set of positive probability so Yj perp Dj | X j Given the regularity assumed for YgTheorem 32 (for Yg) gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 291

Proof of Theorem 41 The proof is analogous to that of Theorem 31 n

Proof of Proposition 42 The proof is analogous to that of Proposition 33 n

Proof of Proposition 43 Under B3 Dgt rArrS Ygt and Dgtminus1 rArrS Ygtminus1 forallg isinGt cupGtminus1 imply that for j isin Jg there exist two measurable functions r1 and r2 such that

Yjt = r1(Uj0 bj0)+ r2(Y jtminus1 Zjt Ujt bjt

) j = 1 N t = 1

Hence

Yjt minusYjtminus1 = r2(Y jtminus1 Zjt Ujt bjt

)minus r2(Y jtminus2 Zjtminus1Ujtminus1 bjtminus1

)

At the sample level this means that

Yt minusYtminus1 = r2(Y tminus1 Zt Ut Bt

)minus r2(Y tminus2 Ztminus1Utminus1 Btminus1

)

Then B4 implies that

Dt perp (Xt Y tminus1Ut Utminus1 Bt Btminus1) | Xt Y tminus1

which further implies that Dt perp (Yt minusYtminus1) | Xt Y tminus1by Dawid (1979 Lemmas 41 and42) as (Yt minus Ytminus1) is a function of (Xt Y tminus1Ut Utminus1 Bt Btminus1) Then we have Dt perpYt | Xt Y tminus1 as Ytminus1 is a component of Y tminus1 n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

Page 25: 33 GRANGER CAUSALITY AND STRUCTURAL CAUSALITY IN … Lu_Su... · 2017. 2. 7. · 264 XUNLUETAL. natural experiments. The goal of this paper is to establish the analogous equiv-alence

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 287

Hahn J (1998) On the role of the propensity score in efficient semiparametric estimation of averagetreatment effect Econometrica 66 315ndash331

Hahn J amp R Moon (2010) Panel data models with finite number of multiple equilibria EconometricTheory 26 863ndash881

Heckman JJ (2000) Causal parameters and policy analysis in economics a twentieth century retro-spective Quarterly Journal of Economics 115 45ndash97

Heckman JJ (2008) Econometric causality International Statistical Review 76 1ndash27Heckman JJ amp R Pinto (2015) Causal analysis after Haavelmo Econometric Theory 31

115ndash151Hirano K G Imbens amp G Ridder (2003) Efficient estimation of average treatment effects using the

estimated propensity score Econometrica 71 1161ndash1189Hoderlein S amp E Mammen (2007) Identification of marginal effects in nonseparable models without

monotonicity Econometrica 75 1513ndash1519Hoderlein S amp E Mammen (2009) Identification and estimation of local average derivatives in non-

separable models without monotonicity Econometrics Journal 12 1ndash25Hoderlein S amp H White (2012) Nonparametric identification in nonseparable panel data models with

generalized fixed effects Journal of Econometrics 168 300ndash314Holland P (1986) Statistics and causal inference Journal of the American Statistical Association 81

945ndash970Holtz-Eakin D W Newey amp HS Rosen (1988) Estimating vector autoregressions with panel data

Econometrica 56 1371ndash1396Hoover K (2008) Causality in economics and econometrics In S Durlauf and L Blume (eds)

The New Palgrave Dictionary of Economics 2nd edition Palgrave MacmillanHsiao C (2014) Analysis of Panel Data Cambridge University PressHuang M Y Sun amp H White (2016) A flexible nonparametric test for conditional independence

Econometric Theory forthcomingHurwicz L (1950) Generalization of the concept of identification In TC Koopmans (eds) Statistical

Inference in Dynamic Economic Models pp 238ndash257 John WileyIm K H Pesaran amp Y Shin (2003) Testing for unit roots in heterogeneous panels Journal of Econo-

metrics 115 53ndash74Imbens G (2000) The role of the propensity score in estimating dose-response functions Biometrika

87 706ndash710Imbens G (2004) Nonparametric estimation of average treatment effects under exogeneity a review

Review of Economics and Statistics 86 4ndash29Imbens G amp W Newey (2009) Identification and estimation of triangular simultaneous equations

models without additivity Econometrica 77 1481ndash1512Imbens G amp J Wooldridge (2009) Recent developments in the econometrics of program evaluation

Journal of Economic Literature 47 5ndash86Kasy M (2011) Identification in triangular systems using control functions Econometric Theory 27

663ndash671Koopmans T (1950) Statistical Inference in Dynamic Economic Models Cowles Commission Mono-

graph No 10 WileyKuersteiner G (2008) GrangerndashSims causality In S Durlauf and L Blume (Eds) The New Palgrave

Dictionary of Economics 2nd edition Palgrave MacmillanLechner M (2001) Identification and estimation of causal effect of multiple treatments under con-

ditional independence assumption In M Lechner F Pfeiller (eds) Econometric Evaluations ofLabour Market Policies Hildelberg Physical 43ndash58

Lechner M (2011) The relation of different concepts of causality used in time series and microecono-metrics Econometric Reviews 30 109ndash127

Lin C-C amp S Ng (2012) Estimation of panel data models with parameter heterogeneity when groupmembership is unknown Journal of Econometric Methods 1 42ndash55

Lindsay B (1995) Mixture Models Theory Geometry and Applications Hayward Institute for Math-ematical Statistics

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

288 XUN LU ET AL

Linton O amp P Gozalo (2014) Testing conditional independence restrictions Econometric Reviews33 523ndash552

Lu X amp L Su (2014) Determining the number of groups in latent panel structures Working paperSchool of Economics Singapore Management University

Matzkin R (2003) Nonparametric estimation of nonadditive random functions Econometrica 711339ndash1375

Matzkin R (2007) Nonparametric identification In JJ Heckman and EE Leamer (eds) The Hand-book of Econometrics 6B North-Holland

Mclachlan G amp K Basford (1988) Mixture Models Inference and Applications to Clustering MarcelDekker

Nair-Reichert U amp D Weinhold (2001) Causality tests for cross-country panels a new look at FDIand economic growth in developing countries Oxford Bulletin of Economics and Statistics 63153-171

Pearl J (2009) Causality Models Reasoning and Inference (2nd edition) Cambridge UniversityPress

Pearl J (2015) Trygve Haavelmo and the emergence of causal calculus Econometric Theory 31152ndash179

Rosenbaum P amp D Rubin (1983) The central role of the propensity score in observational studies forcausal effects Biometrika 70 41ndash55

Rubin D (1974) Estimating causal effects of treatments in randomized and non-randomized studiesJournal of Educational Psychology 66 688ndash701

Rubin D (2004) Direct and indirect causal effects via potential outcomes Scandinavian Journal ofStatistics 31 161ndash170

Sarafidis V amp N Weber (2015) A partially heterogeneous framework for analyzing panel data OxfordBulletin of Economics and Statistics 77 274ndash296

Sims C (1972) Money income and causality American Economic Review 62 540ndash552Song K (2009) Testing conditional independence via Rosenblatt transforms Annals of Statistics 37

4011ndash4015Strotz R amp H Wold (1960) Recursive vs nonrecursive systems An attempt at synthesis Economet-

rica 28 417ndash427Su L Z Shi amp PCB Phillips (2014) Identifying latent structures in panel data Working paper

Department of Economics Yale UniversitySu L amp H White (2007) A Consistent characteristic function-based test for conditional indepen-

dence Journal of Econometrics 141 807ndash834Su L amp H White (2008) A nonparametric Hellinger metric test for conditional independence Econo-

metric Theory 24 829ndash864Su L amp H White (2014) Testing conditional independence via empirical likelihood Journal of

Econometrics 182 27ndash44Sun Y (2005) Estimation and inference in panel structure models Working paper Department of

Economics UCSDWhite H amp K Chalak (2009) Settable systems An extension of Pearlrsquos causal model with optimiza-

tion equilibrium and learning Journal of Machine Learning Research 10 1759ndash1799White H amp K Chalak (2010) Testing a conditional form of exogeneity Economics Letters 109

88ndash90White H amp K Chalak (2013) Identification and identification failure for treatment effects using

structural systems Econometric Reviews 32 273ndash317White H K Chalak amp X Lu (2011) Linking Granger causality and the Pearl causal model with

settable systems Journal of Machine Learning Research Workshop and Conference Proceedings12 1ndash29

White H amp P Kennedy (2009) Retrospective estimation of causal effects through time In J Castleamp N Shephard (eds) The Methodology and Practice of Econometrics A Festschrift in Honour ofDavid F Hendry pp 59ndash87 Oxford University Press

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 289

White H amp X Lu (2010) Granger causality and dynamic structural systems Journal of FinancialEconometrics 8 193ndash243

White H amp X Lu (2011) Causal diagrams for treatment effect estimation with application to selectionof efficient covariates Review of Economics and Statistics 93 1453ndash1459

Wooldridge J (2005) Simple solutions to the initial conditions problem for dynamic nonlinear paneldata models with unobserved heterogeneity Journal of Applied Econometrics 20 39ndash54

Zellner A (1979) Causality and econometrics In K Brunner and AH Meltzer (eds) Three Aspectsof Policy and Policymaking Carnegie-Rochester Conference Series Vol 10 North-Holland 9ndash54

APPENDIX

Proof of Theorem 31 To show D perp U | X we establish that f (u | d x) = f (u | x)for all (ud x) We have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

= f (u x)

f (x)= f (u | x)

where the first equality in the second line holds since A2(i) (ie Dg | Xg sim D | X)implies fg(d x) = f (d | x) fg(x) say and the second holds by A2(i i) fg(u | d x) =fg(u | x) n

Proof of Theorem 32 Note that

f (u | x) = f (u x)

f (x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

We also have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | d x) fg(x)sum

gisinG pg fg(x)

The first equality in the second line follows from A2(i) Then f (u | d x) = f (u | x) forall (ud x) if and only ifsumgisinG

pg fg(u | d x) fg(x) =sumgisinG

pg fg(u | x) fg(x) for all (ud x)

By assumption (Ug | Dg Xg) g isinG is regular It follows immediately that for all g isinGDg perp Ug | Xg n

Proof of Proposition 33 Take any g isin G and let j belong to group Jg Dg rArrSYg means that Yj = r(Zj Uj bj ) Thus Uj perp Dj | X j implies that Yj perp Dj | X j byDawid (1979 Lemmas 41 and 42) This holds for all g isin G Applying Theorem 31gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

290 XUN LU ET AL

Proof of Theorem 34 We prove this by showing that Y perp D | X implies that Dg rArrSYg for all g isinG Given the assumed regularity Theorem 32 ensures that Y perp D | X impliesthat Yg perp Dg | Xg for all g isin G

Given (i) eq(33) and A2(ii) imply that for all j in group JgE

(Yj |X j = x Dj = d

) = r1(d z bj

)+ E[r2

(zUj bj

) | Dj = d X j = x]

= r1(d z bj

)+ E[r2

(zUj bj

) | X j = x]

Clearly r1(d z bj

)is constant in d if and only if E

(Yj |Dj = d X j = x

)is constant in

d Yj perp Dj | X j implies that E(Yj |Dj = d X j = x

)is constant in d which then implies

that r1(d z bj

)is constant in d Thus Dj rArrS Yj so Dg rArrS Yg

Given (i i) let ky = 1 without loss of generality Then by A2(ii) for all j in group JgFUj (u | x) = E

[1Uj le u

∣∣ X j = x]

= E[

1r(d zUj bj

) le r(d zu bj

)∣∣ X j = x]

= E[

1r(Dj Zj Uj bj

) le r(d zu bj

)∣∣ Dj = d X j = x]

= E[

1Yj le r

(d zu bj

)∣∣ Dj = d X j = x]

= Fj(r(d zu bj

) | d x)

By strict monotonicity of Fj (y | d x) in y

r(d zu bj

) = Fminus1j

(FUj (u | x) | d x

)

Now Yj perp Dj | X j implies that Fminus1j

(FUj (u | x) | d x

)is constant in d so r

(d zu bj

)is

also a constant function in d ie Dj rArrS Yj so Dg rArrS Yg As either (i) or (i i) holds for each g isin G we have Dg rArrS Yg for all g isin G n

Proof of Theorem 35 To show that A1 - A2 and the assumed regularity for Ygtogether with Dg rArrS(Xg) Yg for some g isin G imply D perp Y | X we let j in group Jgg isin G be such that Dj rArrS(X j ) Yj A2(i i) ensures that Dj perp Uj | X j so

P[Yj le y | Dj = d X j = x] =int

1r(d zu bj ) le y

d Fj (u | d x)

=int

1r(d zu bj ) le y

d Fj (u | x)

By assumption Dj rArrS(X j ) Yj so there exists ylowast isin supp(Yj ) such that there is no mea-

surable mapping f jylowast for whichint

1r(d zu bj ) le ylowastd Fj (u | d x) = f jylowast(x) ae-x Specifically this rules out the possibility thatint

1r(d zu bj ) le ylowast

d Fj (u | X j ) = f jylowast(X j ) equiv P[Yj le ylowast | X j ] as

Thus there exists ylowast isin supp(Yj ) such that

P[Yj le ylowast | Dj = d X j = x

] = P[Yj le ylowast | X j = x

]

for x in a set of positive probability so Yj perp Dj | X j Given the regularity assumed for YgTheorem 32 (for Yg) gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 291

Proof of Theorem 41 The proof is analogous to that of Theorem 31 n

Proof of Proposition 42 The proof is analogous to that of Proposition 33 n

Proof of Proposition 43 Under B3 Dgt rArrS Ygt and Dgtminus1 rArrS Ygtminus1 forallg isinGt cupGtminus1 imply that for j isin Jg there exist two measurable functions r1 and r2 such that

Yjt = r1(Uj0 bj0)+ r2(Y jtminus1 Zjt Ujt bjt

) j = 1 N t = 1

Hence

Yjt minusYjtminus1 = r2(Y jtminus1 Zjt Ujt bjt

)minus r2(Y jtminus2 Zjtminus1Ujtminus1 bjtminus1

)

At the sample level this means that

Yt minusYtminus1 = r2(Y tminus1 Zt Ut Bt

)minus r2(Y tminus2 Ztminus1Utminus1 Btminus1

)

Then B4 implies that

Dt perp (Xt Y tminus1Ut Utminus1 Bt Btminus1) | Xt Y tminus1

which further implies that Dt perp (Yt minusYtminus1) | Xt Y tminus1by Dawid (1979 Lemmas 41 and42) as (Yt minus Ytminus1) is a function of (Xt Y tminus1Ut Utminus1 Bt Btminus1) Then we have Dt perpYt | Xt Y tminus1 as Ytminus1 is a component of Y tminus1 n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

Page 26: 33 GRANGER CAUSALITY AND STRUCTURAL CAUSALITY IN … Lu_Su... · 2017. 2. 7. · 264 XUNLUETAL. natural experiments. The goal of this paper is to establish the analogous equiv-alence

288 XUN LU ET AL

Linton O amp P Gozalo (2014) Testing conditional independence restrictions Econometric Reviews33 523ndash552

Lu X amp L Su (2014) Determining the number of groups in latent panel structures Working paperSchool of Economics Singapore Management University

Matzkin R (2003) Nonparametric estimation of nonadditive random functions Econometrica 711339ndash1375

Matzkin R (2007) Nonparametric identification In JJ Heckman and EE Leamer (eds) The Hand-book of Econometrics 6B North-Holland

Mclachlan G amp K Basford (1988) Mixture Models Inference and Applications to Clustering MarcelDekker

Nair-Reichert U amp D Weinhold (2001) Causality tests for cross-country panels a new look at FDIand economic growth in developing countries Oxford Bulletin of Economics and Statistics 63153-171

Pearl J (2009) Causality Models Reasoning and Inference (2nd edition) Cambridge UniversityPress

Pearl J (2015) Trygve Haavelmo and the emergence of causal calculus Econometric Theory 31152ndash179

Rosenbaum P amp D Rubin (1983) The central role of the propensity score in observational studies forcausal effects Biometrika 70 41ndash55

Rubin D (1974) Estimating causal effects of treatments in randomized and non-randomized studiesJournal of Educational Psychology 66 688ndash701

Rubin D (2004) Direct and indirect causal effects via potential outcomes Scandinavian Journal ofStatistics 31 161ndash170

Sarafidis V amp N Weber (2015) A partially heterogeneous framework for analyzing panel data OxfordBulletin of Economics and Statistics 77 274ndash296

Sims C (1972) Money income and causality American Economic Review 62 540ndash552Song K (2009) Testing conditional independence via Rosenblatt transforms Annals of Statistics 37

4011ndash4015Strotz R amp H Wold (1960) Recursive vs nonrecursive systems An attempt at synthesis Economet-

rica 28 417ndash427Su L Z Shi amp PCB Phillips (2014) Identifying latent structures in panel data Working paper

Department of Economics Yale UniversitySu L amp H White (2007) A Consistent characteristic function-based test for conditional indepen-

dence Journal of Econometrics 141 807ndash834Su L amp H White (2008) A nonparametric Hellinger metric test for conditional independence Econo-

metric Theory 24 829ndash864Su L amp H White (2014) Testing conditional independence via empirical likelihood Journal of

Econometrics 182 27ndash44Sun Y (2005) Estimation and inference in panel structure models Working paper Department of

Economics UCSDWhite H amp K Chalak (2009) Settable systems An extension of Pearlrsquos causal model with optimiza-

tion equilibrium and learning Journal of Machine Learning Research 10 1759ndash1799White H amp K Chalak (2010) Testing a conditional form of exogeneity Economics Letters 109

88ndash90White H amp K Chalak (2013) Identification and identification failure for treatment effects using

structural systems Econometric Reviews 32 273ndash317White H K Chalak amp X Lu (2011) Linking Granger causality and the Pearl causal model with

settable systems Journal of Machine Learning Research Workshop and Conference Proceedings12 1ndash29

White H amp P Kennedy (2009) Retrospective estimation of causal effects through time In J Castleamp N Shephard (eds) The Methodology and Practice of Econometrics A Festschrift in Honour ofDavid F Hendry pp 59ndash87 Oxford University Press

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 289

White H amp X Lu (2010) Granger causality and dynamic structural systems Journal of FinancialEconometrics 8 193ndash243

White H amp X Lu (2011) Causal diagrams for treatment effect estimation with application to selectionof efficient covariates Review of Economics and Statistics 93 1453ndash1459

Wooldridge J (2005) Simple solutions to the initial conditions problem for dynamic nonlinear paneldata models with unobserved heterogeneity Journal of Applied Econometrics 20 39ndash54

Zellner A (1979) Causality and econometrics In K Brunner and AH Meltzer (eds) Three Aspectsof Policy and Policymaking Carnegie-Rochester Conference Series Vol 10 North-Holland 9ndash54

APPENDIX

Proof of Theorem 31 To show D perp U | X we establish that f (u | d x) = f (u | x)for all (ud x) We have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

= f (u x)

f (x)= f (u | x)

where the first equality in the second line holds since A2(i) (ie Dg | Xg sim D | X)implies fg(d x) = f (d | x) fg(x) say and the second holds by A2(i i) fg(u | d x) =fg(u | x) n

Proof of Theorem 32 Note that

f (u | x) = f (u x)

f (x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

We also have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | d x) fg(x)sum

gisinG pg fg(x)

The first equality in the second line follows from A2(i) Then f (u | d x) = f (u | x) forall (ud x) if and only ifsumgisinG

pg fg(u | d x) fg(x) =sumgisinG

pg fg(u | x) fg(x) for all (ud x)

By assumption (Ug | Dg Xg) g isinG is regular It follows immediately that for all g isinGDg perp Ug | Xg n

Proof of Proposition 33 Take any g isin G and let j belong to group Jg Dg rArrSYg means that Yj = r(Zj Uj bj ) Thus Uj perp Dj | X j implies that Yj perp Dj | X j byDawid (1979 Lemmas 41 and 42) This holds for all g isin G Applying Theorem 31gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

290 XUN LU ET AL

Proof of Theorem 34 We prove this by showing that Y perp D | X implies that Dg rArrSYg for all g isinG Given the assumed regularity Theorem 32 ensures that Y perp D | X impliesthat Yg perp Dg | Xg for all g isin G

Given (i) eq(33) and A2(ii) imply that for all j in group JgE

(Yj |X j = x Dj = d

) = r1(d z bj

)+ E[r2

(zUj bj

) | Dj = d X j = x]

= r1(d z bj

)+ E[r2

(zUj bj

) | X j = x]

Clearly r1(d z bj

)is constant in d if and only if E

(Yj |Dj = d X j = x

)is constant in

d Yj perp Dj | X j implies that E(Yj |Dj = d X j = x

)is constant in d which then implies

that r1(d z bj

)is constant in d Thus Dj rArrS Yj so Dg rArrS Yg

Given (i i) let ky = 1 without loss of generality Then by A2(ii) for all j in group JgFUj (u | x) = E

[1Uj le u

∣∣ X j = x]

= E[

1r(d zUj bj

) le r(d zu bj

)∣∣ X j = x]

= E[

1r(Dj Zj Uj bj

) le r(d zu bj

)∣∣ Dj = d X j = x]

= E[

1Yj le r

(d zu bj

)∣∣ Dj = d X j = x]

= Fj(r(d zu bj

) | d x)

By strict monotonicity of Fj (y | d x) in y

r(d zu bj

) = Fminus1j

(FUj (u | x) | d x

)

Now Yj perp Dj | X j implies that Fminus1j

(FUj (u | x) | d x

)is constant in d so r

(d zu bj

)is

also a constant function in d ie Dj rArrS Yj so Dg rArrS Yg As either (i) or (i i) holds for each g isin G we have Dg rArrS Yg for all g isin G n

Proof of Theorem 35 To show that A1 - A2 and the assumed regularity for Ygtogether with Dg rArrS(Xg) Yg for some g isin G imply D perp Y | X we let j in group Jgg isin G be such that Dj rArrS(X j ) Yj A2(i i) ensures that Dj perp Uj | X j so

P[Yj le y | Dj = d X j = x] =int

1r(d zu bj ) le y

d Fj (u | d x)

=int

1r(d zu bj ) le y

d Fj (u | x)

By assumption Dj rArrS(X j ) Yj so there exists ylowast isin supp(Yj ) such that there is no mea-

surable mapping f jylowast for whichint

1r(d zu bj ) le ylowastd Fj (u | d x) = f jylowast(x) ae-x Specifically this rules out the possibility thatint

1r(d zu bj ) le ylowast

d Fj (u | X j ) = f jylowast(X j ) equiv P[Yj le ylowast | X j ] as

Thus there exists ylowast isin supp(Yj ) such that

P[Yj le ylowast | Dj = d X j = x

] = P[Yj le ylowast | X j = x

]

for x in a set of positive probability so Yj perp Dj | X j Given the regularity assumed for YgTheorem 32 (for Yg) gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 291

Proof of Theorem 41 The proof is analogous to that of Theorem 31 n

Proof of Proposition 42 The proof is analogous to that of Proposition 33 n

Proof of Proposition 43 Under B3 Dgt rArrS Ygt and Dgtminus1 rArrS Ygtminus1 forallg isinGt cupGtminus1 imply that for j isin Jg there exist two measurable functions r1 and r2 such that

Yjt = r1(Uj0 bj0)+ r2(Y jtminus1 Zjt Ujt bjt

) j = 1 N t = 1

Hence

Yjt minusYjtminus1 = r2(Y jtminus1 Zjt Ujt bjt

)minus r2(Y jtminus2 Zjtminus1Ujtminus1 bjtminus1

)

At the sample level this means that

Yt minusYtminus1 = r2(Y tminus1 Zt Ut Bt

)minus r2(Y tminus2 Ztminus1Utminus1 Btminus1

)

Then B4 implies that

Dt perp (Xt Y tminus1Ut Utminus1 Bt Btminus1) | Xt Y tminus1

which further implies that Dt perp (Yt minusYtminus1) | Xt Y tminus1by Dawid (1979 Lemmas 41 and42) as (Yt minus Ytminus1) is a function of (Xt Y tminus1Ut Utminus1 Bt Btminus1) Then we have Dt perpYt | Xt Y tminus1 as Ytminus1 is a component of Y tminus1 n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

Page 27: 33 GRANGER CAUSALITY AND STRUCTURAL CAUSALITY IN … Lu_Su... · 2017. 2. 7. · 264 XUNLUETAL. natural experiments. The goal of this paper is to establish the analogous equiv-alence

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 289

White H amp X Lu (2010) Granger causality and dynamic structural systems Journal of FinancialEconometrics 8 193ndash243

White H amp X Lu (2011) Causal diagrams for treatment effect estimation with application to selectionof efficient covariates Review of Economics and Statistics 93 1453ndash1459

Wooldridge J (2005) Simple solutions to the initial conditions problem for dynamic nonlinear paneldata models with unobserved heterogeneity Journal of Applied Econometrics 20 39ndash54

Zellner A (1979) Causality and econometrics In K Brunner and AH Meltzer (eds) Three Aspectsof Policy and Policymaking Carnegie-Rochester Conference Series Vol 10 North-Holland 9ndash54

APPENDIX

Proof of Theorem 31 To show D perp U | X we establish that f (u | d x) = f (u | x)for all (ud x) We have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

= f (u x)

f (x)= f (u | x)

where the first equality in the second line holds since A2(i) (ie Dg | Xg sim D | X)implies fg(d x) = f (d | x) fg(x) say and the second holds by A2(i i) fg(u | d x) =fg(u | x) n

Proof of Theorem 32 Note that

f (u | x) = f (u x)

f (x)=

sumgisinG pg fg(u | x) fg(x)sum

gisinG pg fg(x)

We also have

f (u | d x) = f (ud x)

f (d x)=

sumgisinG pg fg(ud x)sumgisinG pg fg(d x)

=sumgisinG pg fg(u | d x) fg(d x)sum

gisinG pg fg(d x)

=sumgisinG pg fg(u | d x) f (d | x) fg(x)sum

gisinG pg f (d | x) fg(x)=

sumgisinG pg fg(u | d x) fg(x)sum

gisinG pg fg(x)

The first equality in the second line follows from A2(i) Then f (u | d x) = f (u | x) forall (ud x) if and only ifsumgisinG

pg fg(u | d x) fg(x) =sumgisinG

pg fg(u | x) fg(x) for all (ud x)

By assumption (Ug | Dg Xg) g isinG is regular It follows immediately that for all g isinGDg perp Ug | Xg n

Proof of Proposition 33 Take any g isin G and let j belong to group Jg Dg rArrSYg means that Yj = r(Zj Uj bj ) Thus Uj perp Dj | X j implies that Yj perp Dj | X j byDawid (1979 Lemmas 41 and 42) This holds for all g isin G Applying Theorem 31gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

290 XUN LU ET AL

Proof of Theorem 34 We prove this by showing that Y perp D | X implies that Dg rArrSYg for all g isinG Given the assumed regularity Theorem 32 ensures that Y perp D | X impliesthat Yg perp Dg | Xg for all g isin G

Given (i) eq(33) and A2(ii) imply that for all j in group JgE

(Yj |X j = x Dj = d

) = r1(d z bj

)+ E[r2

(zUj bj

) | Dj = d X j = x]

= r1(d z bj

)+ E[r2

(zUj bj

) | X j = x]

Clearly r1(d z bj

)is constant in d if and only if E

(Yj |Dj = d X j = x

)is constant in

d Yj perp Dj | X j implies that E(Yj |Dj = d X j = x

)is constant in d which then implies

that r1(d z bj

)is constant in d Thus Dj rArrS Yj so Dg rArrS Yg

Given (i i) let ky = 1 without loss of generality Then by A2(ii) for all j in group JgFUj (u | x) = E

[1Uj le u

∣∣ X j = x]

= E[

1r(d zUj bj

) le r(d zu bj

)∣∣ X j = x]

= E[

1r(Dj Zj Uj bj

) le r(d zu bj

)∣∣ Dj = d X j = x]

= E[

1Yj le r

(d zu bj

)∣∣ Dj = d X j = x]

= Fj(r(d zu bj

) | d x)

By strict monotonicity of Fj (y | d x) in y

r(d zu bj

) = Fminus1j

(FUj (u | x) | d x

)

Now Yj perp Dj | X j implies that Fminus1j

(FUj (u | x) | d x

)is constant in d so r

(d zu bj

)is

also a constant function in d ie Dj rArrS Yj so Dg rArrS Yg As either (i) or (i i) holds for each g isin G we have Dg rArrS Yg for all g isin G n

Proof of Theorem 35 To show that A1 - A2 and the assumed regularity for Ygtogether with Dg rArrS(Xg) Yg for some g isin G imply D perp Y | X we let j in group Jgg isin G be such that Dj rArrS(X j ) Yj A2(i i) ensures that Dj perp Uj | X j so

P[Yj le y | Dj = d X j = x] =int

1r(d zu bj ) le y

d Fj (u | d x)

=int

1r(d zu bj ) le y

d Fj (u | x)

By assumption Dj rArrS(X j ) Yj so there exists ylowast isin supp(Yj ) such that there is no mea-

surable mapping f jylowast for whichint

1r(d zu bj ) le ylowastd Fj (u | d x) = f jylowast(x) ae-x Specifically this rules out the possibility thatint

1r(d zu bj ) le ylowast

d Fj (u | X j ) = f jylowast(X j ) equiv P[Yj le ylowast | X j ] as

Thus there exists ylowast isin supp(Yj ) such that

P[Yj le ylowast | Dj = d X j = x

] = P[Yj le ylowast | X j = x

]

for x in a set of positive probability so Yj perp Dj | X j Given the regularity assumed for YgTheorem 32 (for Yg) gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 291

Proof of Theorem 41 The proof is analogous to that of Theorem 31 n

Proof of Proposition 42 The proof is analogous to that of Proposition 33 n

Proof of Proposition 43 Under B3 Dgt rArrS Ygt and Dgtminus1 rArrS Ygtminus1 forallg isinGt cupGtminus1 imply that for j isin Jg there exist two measurable functions r1 and r2 such that

Yjt = r1(Uj0 bj0)+ r2(Y jtminus1 Zjt Ujt bjt

) j = 1 N t = 1

Hence

Yjt minusYjtminus1 = r2(Y jtminus1 Zjt Ujt bjt

)minus r2(Y jtminus2 Zjtminus1Ujtminus1 bjtminus1

)

At the sample level this means that

Yt minusYtminus1 = r2(Y tminus1 Zt Ut Bt

)minus r2(Y tminus2 Ztminus1Utminus1 Btminus1

)

Then B4 implies that

Dt perp (Xt Y tminus1Ut Utminus1 Bt Btminus1) | Xt Y tminus1

which further implies that Dt perp (Yt minusYtminus1) | Xt Y tminus1by Dawid (1979 Lemmas 41 and42) as (Yt minus Ytminus1) is a function of (Xt Y tminus1Ut Utminus1 Bt Btminus1) Then we have Dt perpYt | Xt Y tminus1 as Ytminus1 is a component of Y tminus1 n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

Page 28: 33 GRANGER CAUSALITY AND STRUCTURAL CAUSALITY IN … Lu_Su... · 2017. 2. 7. · 264 XUNLUETAL. natural experiments. The goal of this paper is to establish the analogous equiv-alence

290 XUN LU ET AL

Proof of Theorem 34 We prove this by showing that Y perp D | X implies that Dg rArrSYg for all g isinG Given the assumed regularity Theorem 32 ensures that Y perp D | X impliesthat Yg perp Dg | Xg for all g isin G

Given (i) eq(33) and A2(ii) imply that for all j in group JgE

(Yj |X j = x Dj = d

) = r1(d z bj

)+ E[r2

(zUj bj

) | Dj = d X j = x]

= r1(d z bj

)+ E[r2

(zUj bj

) | X j = x]

Clearly r1(d z bj

)is constant in d if and only if E

(Yj |Dj = d X j = x

)is constant in

d Yj perp Dj | X j implies that E(Yj |Dj = d X j = x

)is constant in d which then implies

that r1(d z bj

)is constant in d Thus Dj rArrS Yj so Dg rArrS Yg

Given (i i) let ky = 1 without loss of generality Then by A2(ii) for all j in group JgFUj (u | x) = E

[1Uj le u

∣∣ X j = x]

= E[

1r(d zUj bj

) le r(d zu bj

)∣∣ X j = x]

= E[

1r(Dj Zj Uj bj

) le r(d zu bj

)∣∣ Dj = d X j = x]

= E[

1Yj le r

(d zu bj

)∣∣ Dj = d X j = x]

= Fj(r(d zu bj

) | d x)

By strict monotonicity of Fj (y | d x) in y

r(d zu bj

) = Fminus1j

(FUj (u | x) | d x

)

Now Yj perp Dj | X j implies that Fminus1j

(FUj (u | x) | d x

)is constant in d so r

(d zu bj

)is

also a constant function in d ie Dj rArrS Yj so Dg rArrS Yg As either (i) or (i i) holds for each g isin G we have Dg rArrS Yg for all g isin G n

Proof of Theorem 35 To show that A1 - A2 and the assumed regularity for Ygtogether with Dg rArrS(Xg) Yg for some g isin G imply D perp Y | X we let j in group Jgg isin G be such that Dj rArrS(X j ) Yj A2(i i) ensures that Dj perp Uj | X j so

P[Yj le y | Dj = d X j = x] =int

1r(d zu bj ) le y

d Fj (u | d x)

=int

1r(d zu bj ) le y

d Fj (u | x)

By assumption Dj rArrS(X j ) Yj so there exists ylowast isin supp(Yj ) such that there is no mea-

surable mapping f jylowast for whichint

1r(d zu bj ) le ylowastd Fj (u | d x) = f jylowast(x) ae-x Specifically this rules out the possibility thatint

1r(d zu bj ) le ylowast

d Fj (u | X j ) = f jylowast(X j ) equiv P[Yj le ylowast | X j ] as

Thus there exists ylowast isin supp(Yj ) such that

P[Yj le ylowast | Dj = d X j = x

] = P[Yj le ylowast | X j = x

]

for x in a set of positive probability so Yj perp Dj | X j Given the regularity assumed for YgTheorem 32 (for Yg) gives D perp Y | X n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 291

Proof of Theorem 41 The proof is analogous to that of Theorem 31 n

Proof of Proposition 42 The proof is analogous to that of Proposition 33 n

Proof of Proposition 43 Under B3 Dgt rArrS Ygt and Dgtminus1 rArrS Ygtminus1 forallg isinGt cupGtminus1 imply that for j isin Jg there exist two measurable functions r1 and r2 such that

Yjt = r1(Uj0 bj0)+ r2(Y jtminus1 Zjt Ujt bjt

) j = 1 N t = 1

Hence

Yjt minusYjtminus1 = r2(Y jtminus1 Zjt Ujt bjt

)minus r2(Y jtminus2 Zjtminus1Ujtminus1 bjtminus1

)

At the sample level this means that

Yt minusYtminus1 = r2(Y tminus1 Zt Ut Bt

)minus r2(Y tminus2 Ztminus1Utminus1 Btminus1

)

Then B4 implies that

Dt perp (Xt Y tminus1Ut Utminus1 Bt Btminus1) | Xt Y tminus1

which further implies that Dt perp (Yt minusYtminus1) | Xt Y tminus1by Dawid (1979 Lemmas 41 and42) as (Yt minus Ytminus1) is a function of (Xt Y tminus1Ut Utminus1 Bt Btminus1) Then we have Dt perpYt | Xt Y tminus1 as Ytminus1 is a component of Y tminus1 n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the

Page 29: 33 GRANGER CAUSALITY AND STRUCTURAL CAUSALITY IN … Lu_Su... · 2017. 2. 7. · 264 XUNLUETAL. natural experiments. The goal of this paper is to establish the analogous equiv-alence

GRANGER CAUSALITY AND STRUCTURAL CAUSALITY 291

Proof of Theorem 41 The proof is analogous to that of Theorem 31 n

Proof of Proposition 42 The proof is analogous to that of Proposition 33 n

Proof of Proposition 43 Under B3 Dgt rArrS Ygt and Dgtminus1 rArrS Ygtminus1 forallg isinGt cupGtminus1 imply that for j isin Jg there exist two measurable functions r1 and r2 such that

Yjt = r1(Uj0 bj0)+ r2(Y jtminus1 Zjt Ujt bjt

) j = 1 N t = 1

Hence

Yjt minusYjtminus1 = r2(Y jtminus1 Zjt Ujt bjt

)minus r2(Y jtminus2 Zjtminus1Ujtminus1 bjtminus1

)

At the sample level this means that

Yt minusYtminus1 = r2(Y tminus1 Zt Ut Bt

)minus r2(Y tminus2 Ztminus1Utminus1 Btminus1

)

Then B4 implies that

Dt perp (Xt Y tminus1Ut Utminus1 Bt Btminus1) | Xt Y tminus1

which further implies that Dt perp (Yt minusYtminus1) | Xt Y tminus1by Dawid (1979 Lemmas 41 and42) as (Yt minus Ytminus1) is a function of (Xt Y tminus1Ut Utminus1 Bt Btminus1) Then we have Dt perpYt | Xt Y tminus1 as Ytminus1 is a component of Y tminus1 n

Cambridge Core terms of use available at httpswwwcambridgeorgcoreterms httpsdoiorg101017S0266466616000086Downloaded from httpswwwcambridgeorgcore Singapore Management University (SMU) on 07 Feb 2017 at 053305 subject to the