23
Distinguishing Cause from Effect Using Quantiles: Bivariate Quantile Causal Discovery Natasa Tagasovska 12 Val´ erie Chavez-Demoulin 1 Thibault Vatter 3 Abstract Causal inference using observational data is chal- lenging, especially in the bivariate case. Through the minimum description length principle, we link the postulate of independence between the gener- ating mechanisms of the cause and of the effect given the cause to quantile regression. Based on this theory, we develop Bivariate Quantile Causal Discovery (bQCD), a new method to distinguish cause from effect assuming no confounding, se- lection bias or feedback. Because it uses multiple quantile levels instead of the conditional mean only, bQCD is adaptive not only to additive, but also to multiplicative or even location-scale gen- erating mechanisms. To illustrate the effective- ness of our approach, we perform an extensive empirical comparison on both synthetic and real datasets. This study shows that bQCD is robust across different implementations of the method (i.e., the quantile regression), computationally ef- ficient, and compares favorably to state-of-the-art methods. 1. Introduction Driven by the usefulness of causal inference in most scien- tific fields, an increasing body of research has contributed towards understanding the generative processes behind data. The aim is elevation of learning models towards more pow- erful interpretations: from correlations and dependencies towards causation (Pearl, 2009; Spirtes et al., 2000b; Dawid et al., 2007; Pearl et al., 2016; Sch¨ olkopf, 2019). While the golden standard for causal discovery is random- ized control trials (Fisher, 1936), experiments or interven- tions in a system are often prohibitively expensive, unethical, or, in many cases, impossible. In this context, an alterna- tive is to use observational data to infer causal relationships 1 HEC, University of Lausanne 2 Swiss Data Science Center EPFL/ETHZ 3 Statistics Department, Columbia University. Corre- spondence to: Natasa Tagasovska <natasa.tagasovska@epfl.ch>. Proceedings of the 37 th International Conference on Machine Learning, Vienna, Austria, PMLR 119, 2020. Copyright 2020 by the author(s). ●● ●● ●● ●●● ●● ●● ●● ●●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●●●● ●● ●●●●●● ●● ●● ●●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●●● ●● ●● ●● ●●● ●● ●● ●●●● ●●●●● ●● ●●● ●●● ●● ●● ●● ●●● ●●● ●● ●● ●● ●● ●● ●●●●●● ●● ●● ●●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●●●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●●●●● ●● ●●● ●● ●● ●● ●● ●● ●● ●● ●●● ●●● ●● ●● ●● ●● ●● ●● ●● ●● ●●● ●● ●● ●● ●●●●●● ●● ●●● ●● ●●● ●●● ●● ●● ●● ●● ●● ●● ●● ●●● ●●●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●●● ●● ●●● ●● ●● ●● ●● ●● ●● ●●● ●● ●● ●●● ●● ●● ●●●● -2 -1 0 1 2 -2 0 2 x y (a) Additive ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●●● ●● ●● ●●●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●●● ●●● ●● ●● ●● ●● ●● ●● ●● ●● ●●● ●● ●● ●● ●●●●●● ●● ●●● ●●● ●●● ●● ●●●●●● ●● ●● ●● ●● ●● ●●● ●●● ●● ●●●●● ●● ●● ●● ●●● ●●●●● ●● ●● ●● ●● ●● ●● ●●● ●● ●● ●● ●● ●● ●● ●● ●● ●●● ●●● ●● ●● ●● ●●●●● ●● ●● ●● ●●●●●●●● ●● ●●●● ●●●● ●● ●● ●●●●●●● ●●●●● ●●●● ●● ●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●● -2.5 0.0 2.5 5.0 -2 0 2 x y (b) Multiplicative ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●●● ●● ●●● ●● ●● ●● ●● ●●●●●● ●● ●●● ●● ●● ●● ●● ●● ●●●● ●● ●● ●● ●● ●● ●●●●● ●● ●●● ●● ●● ●● ●● ●● ●● ●● ●●● ●●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●● ●●● ●● ●● ●● ●● ●● ●● ●● ●●●●●● ●● ●●● ●●●●● ●●●● ●●●● ●● ●●●●●● ●● ●● ●● ●● ●●● ●● ●● ●●●●● ●●● ●● ●●●●● ●● ●● ●● ●●●● ●●●●●●● ●●● ●● ●● ●●●● ●● ●●●● ●●● ●● ●● ●●● ●●● ●●●●● ●● ●● ●●● ●●● ●●●● ●● ●●●● ●●●●● ●● ●●● ●● ●●●●●●●● ●●●●●● ●●●● ●● ●●● ●● ●●●●●●●●● ●●●●●●●●●● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● -1 0 1 2 3 -2 0 2 x y (c) Location-Scale Figure 1. Diverse cause-effect (X Y ) setups. Green - median, blue - 0.9 th quantile, red - 0.1 th quantile. Focusing on the mean only is not enough in the location-scale and multiplicative cases. (Spirtes et al., 2000b; Maathuis & Nandy, 2016). This chal- lenging task has been tackled by many, often relying on testing conditional independence and backed up by heuris- tics (Maathuis & Nandy, 2016; Spirtes & Zhang, 2016; Peters et al., 2017). Borrowing from structural equations and graphical models, structural causal models (SCMs, Pearl et al., 2016; Pe- ters et al., 2017) represent the causal structure of variables X 1 , ··· ,X d using equations such as X c = f c (X PA(c),G ,N c ),c ∈{1,...,d} , where f c is a causal mechanism linking the child/effect X c to its parents/direct causes X PA(c),G , N c is another vari- able independent of X PA(c),G , and G is the directed graph obtained from drawing arrows from parents to their children. Further complications arise when observing only two vari- ables: one cannot distinguish between latent confounding (X Z Y ) and direct causation (X Y or X Y ) without additional assumptions (Shimizu et al., 2006; Janz- ing et al., 2012; Peters et al., 2014; Lopez-Paz et al., 2015). In this paper, we focus on distinguishing cause fom effect in a bivariate setting, assuming the presence of a causal link, but no confounding, selection bias or feedback. An alternative is to impose specific structural restrictions. For example, (non-)linear additive noise models with Y = f (X)+ N Y (Shimizu et al., 2006; Hoyer et al., 2009; Peters et al., 2011) or post nonlinear models Y = g(f (X)+ N Y ) (Zhang & Hyv ¨ arinen, 2009) allow to establish causal identi- fiability without some of those assumptions. In Figure 1, we illustrate a limitation of causal inference methods based on the conditional mean and refer to Sec- tion 2.2 for more details on the underlying generative mech- anisms. From Figure 1(a), it is clear that the variance of the arXiv:1801.10579v4 [stat.ML] 14 Aug 2020

Distinguishing Cause from Effect Using Quantiles ... · Distinguishing Cause from Effect Using Quantiles: Bivariate Quantile Causal Discovery Natasa Tagasovska1 2 Valerie Chavez-Demoulin´

  • Upload
    others

  • View
    17

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Distinguishing Cause from Effect Using Quantiles ... · Distinguishing Cause from Effect Using Quantiles: Bivariate Quantile Causal Discovery Natasa Tagasovska1 2 Valerie Chavez-Demoulin´

Distinguishing Cause from Effect Using Quantiles:Bivariate Quantile Causal Discovery

Natasa Tagasovska 1 2 Valerie Chavez-Demoulin 1 Thibault Vatter 3

AbstractCausal inference using observational data is chal-lenging, especially in the bivariate case. Throughthe minimum description length principle, we linkthe postulate of independence between the gener-ating mechanisms of the cause and of the effectgiven the cause to quantile regression. Based onthis theory, we develop Bivariate Quantile CausalDiscovery (bQCD), a new method to distinguishcause from effect assuming no confounding, se-lection bias or feedback. Because it uses multiplequantile levels instead of the conditional meanonly, bQCD is adaptive not only to additive, butalso to multiplicative or even location-scale gen-erating mechanisms. To illustrate the effective-ness of our approach, we perform an extensiveempirical comparison on both synthetic and realdatasets. This study shows that bQCD is robustacross different implementations of the method(i.e., the quantile regression), computationally ef-ficient, and compares favorably to state-of-the-artmethods.

1. IntroductionDriven by the usefulness of causal inference in most scien-tific fields, an increasing body of research has contributedtowards understanding the generative processes behind data.The aim is elevation of learning models towards more pow-erful interpretations: from correlations and dependenciestowards causation (Pearl, 2009; Spirtes et al., 2000b; Dawidet al., 2007; Pearl et al., 2016; Scholkopf, 2019).While the golden standard for causal discovery is random-ized control trials (Fisher, 1936), experiments or interven-tions in a system are often prohibitively expensive, unethical,or, in many cases, impossible. In this context, an alterna-tive is to use observational data to infer causal relationships

1HEC, University of Lausanne 2Swiss Data Science CenterEPFL/ETHZ 3Statistics Department, Columbia University. Corre-spondence to: Natasa Tagasovska <[email protected]>.

Proceedings of the 37 th International Conference on MachineLearning, Vienna, Austria, PMLR 119, 2020. Copyright 2020 bythe author(s).

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

●●

●●●

●●●●

●●●●

●●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●●●

●●

●●●●●●

●●

●●●

●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●●●●●

●●●

●●●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●●●

●●

●●●

●●

●●●

●●●

●●

●●●●●●

●●

●●

●●

●●

●●●●

●●

●●

●●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●

−2

−1

0

1

2

−2 0 2x

y

(a) Additive

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●●●

●●

●●●

●●●●●

●●●●

●●●

●●

●●●●●●

●●

●●

●●

●●

●●●

●●●

●●●●●●●

●●●

●●

●●●●●

●●●●

●●

●●●●

●●●●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●●●

●●●

●●●

●●

●●

●●

●●●●●●

●●

●●●

●●

●●●●●●●●

●●●●●●●●●●●

●●●

●●

●●●●●●●

●●●●●●●●●

●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●● ●

−2.5

0.0

2.5

5.0

−2 0 2x

y

(b) Multiplicative

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●●

●●

●●

●●

●●●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●●●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●●●

●●

●●●

●●

●●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●●●●●

●●

●●●

●●●●●

●●●●●

●●●●

●●

●●●

●●●●●●

●●●

●●

●●

●●●

●●

●●

●●●●●●

●●●●●●●●

●●●●●

●●●

●●●●●

●●●●

●●

●●●●

●●●●●●●

●●●

●●●●

●●●●

●●●●

●●

●●●●

●●●●

●●

●●

●●●●●●

●●●●●●

●●

●●●●●●●●●

●●●●

●●●●●

●●

●●●●●●●●●●

●●●

●●●

●●

●●●●●●●●●

●●●●●●●●●●●●●●●●

●●

●●●●●●●●●

●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

−1

0

1

2

3

−2 0 2x

y

(c) Location−Scale

Figure 1. Diverse cause-effect (X → Y ) setups. Green - median,blue - 0.9th quantile, red - 0.1th quantile. Focusing on the meanonly is not enough in the location-scale and multiplicative cases.

(Spirtes et al., 2000b; Maathuis & Nandy, 2016). This chal-lenging task has been tackled by many, often relying ontesting conditional independence and backed up by heuris-tics (Maathuis & Nandy, 2016; Spirtes & Zhang, 2016;Peters et al., 2017).Borrowing from structural equations and graphical models,structural causal models (SCMs, Pearl et al., 2016; Pe-ters et al., 2017) represent the causal structure of variablesX1, · · · , Xd using equations such as

Xc = fc(XPA(c),G , Nc), c ∈ {1, . . . , d} ,

where fc is a causal mechanism linking the child/effect Xc

to its parents/direct causes XPA(c),G , Nc is another vari-able independent of XPA(c),G , and G is the directed graphobtained from drawing arrows from parents to their children.Further complications arise when observing only two vari-ables: one cannot distinguish between latent confounding(X ← Z → Y ) and direct causation (X → Y or X ← Y )without additional assumptions (Shimizu et al., 2006; Janz-ing et al., 2012; Peters et al., 2014; Lopez-Paz et al., 2015).In this paper, we focus on distinguishing cause fom effectin a bivariate setting, assuming the presence of a causal link,but no confounding, selection bias or feedback.An alternative is to impose specific structural restrictions.For example, (non-)linear additive noise models with Y =f(X)+NY (Shimizu et al., 2006; Hoyer et al., 2009; Peterset al., 2011) or post nonlinear models Y = g(f(X) +NY )(Zhang & Hyvarinen, 2009) allow to establish causal identi-fiability without some of those assumptions.In Figure 1, we illustrate a limitation of causal inferencemethods based on the conditional mean and refer to Sec-tion 2.2 for more details on the underlying generative mech-anisms. From Figure 1(a), it is clear that the variance of the

arX

iv:1

801.

1057

9v4

[st

at.M

L]

14

Aug

202

0

Page 2: Distinguishing Cause from Effect Using Quantiles ... · Distinguishing Cause from Effect Using Quantiles: Bivariate Quantile Causal Discovery Natasa Tagasovska1 2 Valerie Chavez-Demoulin´

Bivariate Quantile Causal Discovery

●●

● ●

●●

●●

●●

● ●

●●

●●

●● ●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●● ●

500

1000

1500

2000

0 1000 2000 3000Income

Foo

d ex

pend

iture

● ● ● ●0.1thquantile median 0.9thquantile mean

Figure 2. Heteroskedasticity in food expenditure as a function ofincome for Belgian working class households. Our method usesmultiple quantiles to find the correct causal direction.

effect variable is independent of the cause. As a result, theindependence between the cause and the effect’s noise canbe measured in various ways by substracting an estimate ofthe conditional mean. In Figure 1(b) and (c) however, themechanisms are Y = g(X)NY and Y = f(X) + g(X)NYrespectively. In other words, the variance of the effect alsodepends on the cause, i.e. it exhibits heteroskedasticity.Assume that E(NY ) = 0, then in the multiplicative case,we have that E(Y |X = x) = 0. And it is not sensible touse the conditional mean to identify the causal direction.But since Var(Y |X = x) = x2, the variance is more infor-mative. Similarly, features of the conditional distributions(e.g., conditional spread) different from the location mighthelp. In such cases, relying on mutliple (conditional) quan-tiles rather than on the mean only can help. As we showin Section 3, it allows us to correctly determine the causaldirection across various benchmarks where additive-noisecompetitors fail when their assumptions are not met.A classic example of real world data displaying such fea-tures is the impact of income on expenditure on meals: asan individual’s income increases, so does its food expendi-ture and food expenditure variability. An explanation couldbe as follows: while a poorer person will spend a ratherconstant amount on inexpensive food, wealthier individualsoccasionally buy inexpensive food and sometimes eat expen-sive meals. Or, that wealthier individuals have more leewaywhen deciding which fraction of their income to allocateto food expenditure, whereas poorer ones are constrainedby necessity. This heteroskedastic effect can be observedin Figure 2, where our method use multiple quantiles to findthe correct causal direction.Another line of work avoids functional restrictions by rely-ing on the independence of cause and mechanism postulate(Scholkopf et al., 2012; Peters et al., 2017).

Postulate 1 (Sgouritsa et al. 2015). The marginal distri-bution of the cause and the conditional distribution of theeffect given the cause, corresponding to independent mech-anisms of nature, are algorithmically independent (i.e., theycontain no information about each other).

Information Geometric Causal Inference (IGCI) (Janzinget al., 2012) uses the postulate directly for causal discov-ery. Similarly, Regression Error based Causal Inference(RECI) (Blobaum et al., 2018) develops structural assump-tions based on the postulate to imply that, if X causes Yimplies an asymmetry in mean-regression errors, that is

E[(Y − E[Y |X])2] ≤ E[(Y X − E[X|Y ])2]. (1)

Alternatively, (Mooij et al., 2009; Janzing & Scholkopf,2010) reformulate the postulate through asymmetries inKolmogorov complexities (Kolmogorov, 1963) betweenmarginal and conditionals distributions. However, the halt-ing problem (Turing, 1938) implies that the Kolmogorovcomplexity is not computable, and approximations or prox-ies have to be derived to make the concept practical.In this context, (Mooij et al., 2010) proposes a method basedon the minimum message length principle using Bayesianpriors, while others are based on reproducing kernel Hilbertspace embedding such as EMD (Chen et al., 2014), FT (Liu& Chan, 2017) and KCDC by Mitrovic et al. (2018), orusing entropy as a measure of complexity (Kocaoglu et al.,2017). A related line of work suggests using the minimumdescription length (MDL, Rissanen, 1978) principle as aproxy for Kolmogorov complexity: (Budhathoki & Vreeken,2017) uses MDL for causal discovery on binary data, andSlope (Marx & Vreeken, 2017; 2019) implements (localand) global functional relations using MDL based regressionand is suitable for continuous data.In this paper, we build on a similar idea, using quantile scor-ing as a proxy for the Kolmogorov complexity through theMDL principle. We develop a method leveraging asymetriessimilar to (1), replacing the squared loss and conditionalmean by the pinball loss and conditional quantiles. To thebest of our knowledge, quantiles have only been mentionedin a somewhat related context by (Heinze-Deml et al., 2018),where quantile predictions are used to exploit the invarianceof causal models across different environments. As opposedto (Heinze-Deml et al., 2018), our method uses an asymme-try directly derived from the postulate, and therefore it doesnot require an additional variable for the environment.To avoid the restrictive assumptions imposed by standardquantile regression techniques (e.g., linearity of the quan-tiles or additive relationships), we suggest fully nonparamet-ric bQCD implementations using three different approaches:copulas, quantile forests or quantile neural networks. Wealso show that bQCD is robust to the choice regression ap-proach. To the best of our knowledge, we are the first toexplore the idea of using conditional quantiles to distinguish

Page 3: Distinguishing Cause from Effect Using Quantiles ... · Distinguishing Cause from Effect Using Quantiles: Bivariate Quantile Causal Discovery Natasa Tagasovska1 2 Valerie Chavez-Demoulin´

Bivariate Quantile Causal Discovery

cause from effect in bivariate observational data. Our maincontributions are:

• a new method based on quantile scoring to determinethe causal direction without any assumptions on theclass of causal mechanisms along with a theoreticalanalysis justifying its usage (Section 2),

• quantile causal discovery (bQCD), an efficient imple-mentation robust to the choice of underlying regressor(Section 3.1),

• a new benchmark set of synthetic cause-effect pairsfrom additive, location-scale and multiplicative noisemodels (Section 3.2),

• a comparative study to benchmark bQCD against state-of-the-art alternatives (Section 3).

2. Causal Discovery using QuantilesIn this section, we develop our quantile-based method fordistinguishing between cause and effect from continuousand discrete observational data.Problem setting We restrict ourselves to bivariate cases byconsidering pairs of univariate random variables. We furthersimplify the problem by assuming the existence of a causalrelationship but the absence of confounding, selection bias,and feedback.

2.1. Kolmogorov Complexity and Quantile ScoringLet X and Y be two random variables with some jointdistribution F , and where FX , FY and FY |X , FX|Y are re-spectively the marginal and conditional distributions. TheKolmogorov complexity of a distribution F , denoted byK(F ), is the length of the shortest program p that outputsF (x) up to precision q for all x, that is

K(F ) = min {| p | : | U(p, q, x)− F (x) |≤ 1/q, ∀x} ,

with U a Turing machine extracting p(x, q) (see Janzing &Scholkopf, 2010, and references therein).In our setting, the Kolmogorov complexity can be leveragedto discover which one of X and Y is the cause and whichone is the effect through the following theorem:

Theorem 1 (Mooij et al. 2010). If Postulate 1 holds and Xcauses Y , thenK(FX)+K(FY |X) ≤ K(FY )+K(FX|Y ).

Stated differently, the causal direction between X and Ycan be recognized as the least complex, that is the decom-position of the joint distribution leading to the lowest valueof the Kolmogorov complexity. This is because, in this con-text, the direct translation of Postulate 1 is that the mutualinformation between FX and FY |X equals zero while thatbetween FY and FX|Y does not. In other words, the numberof bits saved when compressing X and Y jointly rather than

compressing them independently is smaller when using thecausal factorization of the joint distribution. Importantly,because we assumed the existence of a causal link, the asym-metry in Theorem 1 is not only necessary but also sufficient.

Remark 1. Note that our problem setting corresponds to as-sumptions A, B and C in (Mooij et al., 2010). Furthermore,Theorem 1 holds up to an additive constant, as the asymme-try does not depend on the strings involved, but may dependon the Turing machines they refer to (see e.g., Janzing &Scholkopf, 2010; Hernandez-Orallo & Dowe, 2010). Whenusing the same Turing machine to compute all complexities,carrying this constant over is not required.

Since the Kolmogorov complexity is not computable, weuse the MDL principle (Rissanen, 1978) as a proxy: theless complex direction becomes the one allowing a bettercompression of the data. In other words, the correct causaldirection allows to store information using the shortest de-scription length, or code length (CL).Two-step MDL encoding To construct a coding schemesatisfying the MDL principle, we use a two-stage approach(see e.g., Hansen & Yu, 2001, Section 3.1): first encode amodel and then the data using that model. Assume thatthe goal is to compress X = {Xi}ni=1 with Xi ∈ Xi.i.d. according to some distribution FX ∈ F , where themodel class is known to be F = {fθ(·) | θ ∈ Θ} and θis an indexing parameter. When θ is known, Shanon’ssource coding theorem implies that an encoding based onfθ is “optimal”: on average, it achieves the lower boundon any lossless compression, that is the differential entropy−n∫X fθ(z) log fθ(z)dz. And the code length of a dataset

encoded using the known fθ is given by minus its log-likelihood CL(X | θ) = −

∑ni=1 log fθ(Xi). This opti-

mal scheme thus amounts at transmitting the true parameteralong with the data encoded using its known distribution:

CLθ(X) = CL(θ) + CL(X | θ), (2)

where the two terms on the right-hand side result fromtransmitting the model and the data encoded using the model(see e.g., Hansen & Yu, 2001, Section 3.1).Encoding marginals via quantiles Since knowledge ofthe model class and true parameter is seldom achieved,consider compression using partial information about thedata-generating process. Assuming that we only know aτ -quantile qX,τ = arg infq {q | FX(q) = τ}, we can com-press the data by transmitting τ , qX,τ , and the “residuals”EX = {Xi − qX,τ}ni=1. This encoding results in

CLτ (X) = CL(τ) + CL(qX,τ ) + CL(EX | qX,τ , τ). (3)

To encode EX , it is then natural to use the asymmetricLaplace (AL) distribution (see Aue et al., 2014; Geraci& Bottai, 2007; Yu et al., 2003) with density f(z; q, τ) =

Page 4: Distinguishing Cause from Effect Using Quantiles ... · Distinguishing Cause from Effect Using Quantiles: Bivariate Quantile Causal Discovery Natasa Tagasovska1 2 Valerie Chavez-Demoulin´

Bivariate Quantile Causal Discovery

τ(1−τ) exp(−Sτ (q, z)), where Sτ (·, ·) is the quantile scor-ing (QS) function

Sτ (x1, x2) = (I {x1 ≥ x2} − τ)(x1 − x2).

Given that the distribution of EX is generally unknown,using the AL is optimal in the sense that the populationquantile is exactly the minimizer of the QS’s expected value,namely qX,τ = argminq E [Sτ (q,X)]. We revisit the linkbetween QS and the AL in Section 2.2.Using this encoding, the last term in the right-hand sideof (3) is finally given by the negative of the AL log-likelihood (Rissanen, 1986), that is

CL(EX | qX,τ , τ) =

n∑i=1

Sτ (qX,τ , Xi)− an(τ), (4)

where an(τ) = n log(τ(1− τ)).Encoding conditionals via quantiles Next, consider theproblem of compressing {Xi, Yi}ni=1 with (Xi, Yi) i.i.d.according to some joint distribution F . Because F canbe decomposed using the marginal and the conditionaldistributions, one can proceed as above with FX to com-press X , and then use FY |X to compress Y given X . As-sume similarly that the only information about the condi-tional data-generating process is the conditional τ -quantileqY |X=x,τ = arg infq

{q | FY |X=x(q) = τ

}of Y given

X = x. Encoding a conditional distribution using (3)and (4) thus results in

CLτ (Y | X) = CL(τ) + CL(qY |X,τ ) (5)

+

n∑i=1

Sτ (qY |X=Xi,τ , Yi)− an(τ).

In other words, the data is compressed by transmitting τ ,qY |X,τ , and the “residuals” EY |X = {Yi − qY |X=Xi,τ}ni=1.Note that, for a given fixed quantile level τ , while qX,τ is areal number, qY |X,τ is generally a function of the condition-ing variable.Causal identification via quantiles The same idea can beapplied to compress the data with the decomposition of thejoint distribution F using the marginal FY and the condi-tional FX|Y distributions. But according to Theorem 1 andusing CLs as proxies for Kolmogorov complexities, if X isa cause of Y , one expects that for all τ ∈ (0, 1)

CLτ (X) + CLτ (Y | X) ≤ CLτ (Y ) + CLτ (X | Y )

with high probability as the sample size increases. We thusmake the following “identifying” assumption for X to be acause of Y :

Assumption 1. P(

CLτ (X)+CLτ (Y |X)

CLτ (Y )+CLτ (X|Y )≤ 1)−→n→∞

1

This assumption is called identifying because, if the ratio ofCLs equals 1 for all τ ∈ (0, 1), quantile-based CLs cannot

be leveraged for causal discovery. For any given sample sizeand quantile level τ , cases where CL(τ) =∞, CL(qX,τ ) =∞ or CL(qY |X,τ ) = ∞ are problematic. The issue canbe resolved by assuming for instance that all populationquantities are computable and can be transmitted using afinite albeit potentially increasing precision.Using O/o for the usual asymptotic notations, we draw afirst link between CLs and QSs at the population level.

Theorem 2. Assume that CL(l) = o(n) for l ∈{τ, qX,τ , qY,τ , qY |X,τ , qX|Y,τ )}, then Assumption 1 holdsif and only if

E [Sτ (qX,τ , Xi)] + E[Sτ (qY |X=Xi,τ , Yi)

]E [Sτ (qY,τ , Yi)] + E

[Sτ (qX|Y=Yi,τ , Xi)

] ≤ 1. (6)

Because the population (conditional) quantiles and the ex-pectations involved in the ratio are seldom known in practice,Theorem 2 cannot be leveraged directly for causal discovery.However, it can be used to determine whether a given causalmodel satisfies Assumption 1, as exmplified below. Theproof can be found in Section A.1.

Example 1. Consider X being a cause of Y in the lin-ear model defined by X = θ1NX and Y = γX + θ2NYwith NX and NY two independent sources of noise andθ1, θ2, γ > 0 such that var(X) = var(Y ). Note that thecondition on the variances simply ensures that the variableshave the same scale.Letting NX , NY ∼ N(0, 1), X ∼ N(0, θ2

1) and Y ∼N(0, γ2θ2

1+θ22). The variance equality condition can be sat-

isfied by choosing γ ∈ (0, 1) and letting θ2 = θ1

√1− γ2,

resulting in X | Y = y ∼ N(γy, (1 − γ2)θ21) and

Y | X = x ∼ N(γx, (1 − γ2)θ21). Using the fact that,

if Z ∼ N(µ, σ), then E [Sτ (qZ,τ , Z)] = σc(τ), wherec(τ) = e−Φ−1(τ)2/2/

√2π with Φ the standard normal cu-

mulative distribution, it is then straightforward to verifythat the ratio of expectations in Theorem 2 is equal to oneindependently of τ . In other words, the linear Gaussianmodel is not identifiable (Shimizu et al., 2006; Hoyer et al.,2009; Peters et al., 2014).

Remark 2. Developing structural or distributional assump-tions, e.g. based directly on Postulate 1, such that Assump-tion 1 or (6) holds is an open problem. (Blobaum et al.,2018) paves the way in the context of (1) and in the regimeof almost deterministic relations. But quantiles and quantilescores are harder to manipulate than conditional variancesand squared losses.

Causal discovery via quantiles In order to leverage As-sumption 1 for causal discovery, we let qX,τ , qY,τ , qX|Y,τ ,qY |X,τ be estimators of the respective population quantiles,and further make the following assumption:

Assumption 2. qX,τ , qY,τ , qX|Y,τ , qY |X,τ satisfy

• |qX,τ − qX,τ | = op(1) and |qY,τ − qY,τ | = op(1),

Page 5: Distinguishing Cause from Effect Using Quantiles ... · Distinguishing Cause from Effect Using Quantiles: Bivariate Quantile Causal Discovery Natasa Tagasovska1 2 Valerie Chavez-Demoulin´

Bivariate Quantile Causal Discovery

•∣∣qY |X=x,τ − qY |X=x,τ

∣∣ = op(1) for every x and∣∣qX|Y=y,τ − qX|Y=y,τ

∣∣ = op(1) for every y,

• CL(qX,τ ) = o(n), CL(qY |X,τ ) = o(n),CL(qY,τ ) = o(n), and CL(qX|Y,τ ) = o(n),

using Op/op for stochastic boundedness and convergence inprobability.

The first two bullet points simply state that the unconditionaland conditional quantile estimators are consistent withoutrate. As for the third bullet point, note that having the samegrowth rate for the CL of all models does not prevent asmaller number of parameters in the correct causal direction(see e.g., Blobaum et al., 2018; Marx & Vreeken, 2019).However, it means that, if the population quantiles q·,τ arereplaced by estimators q·,τ in (3) and (5), the CLs of themodels are all asymptotically dominated by the CLs of theresiduals, namely the

∑ni=1 Sτ (·, ·) terms.

The third bullet point of Assumption 2 includes the impor-tant case where all CLs areO(log n) (Rissanen, 1983; 1986):discretizing a compact parameter space with a n−1/2 grid(i.e., the magnitude of the estimation error) and transmittingan estimated parameter using a uniform encoder with thisprecision is optimal for regular parametric families. Usingthis n−1/2 precision, each parameter thus leads to a cost of1/2 log n.For nonparametric (i.e., with a non-Euclidean parameterspace) models, a similar idea can be applied, as estimatorstypically converge at a rate slower than n−1/2. For instance,it is well known that the optimal grid size for histogramestimators of continuous densities with bounded first deriva-tive is proportional to n−1/3 (Theorem 6.11, Wasserman,2006). And if each of the histogram heights is encodedusing n−1/2 grid (i.e., smaller than the estimation error),the resulting estimator’s CL is proportional to n1/3/2 log n,which satisfies the third bullet point of Assumption 2. Moregenerally, the fastest possible rate for kernel density estima-tors of continuous densities with k bounded derivatives isnk/(2k+1) (Theorem 6.31, Wasserman, 2006). As a result,discretizing the support on a grid proportional to n−k/(2k+1)

and encoding the kernel values using a n−1/2 grid encuresan estimator’s CL proportional to nk/(2k+1)/2 log n.However, Assumption 2 excludes datasets where quantilescan only be consistently estimated with models having“too many parameters”. For instance, if the only consis-tent estimator is an over-parametrized neural network witho(nlayers × nneurons/layer) > o(n), then Assumption 2does not hold. But such a situation is unlikely in our bivari-ate setting.We can then state the following theorem:

Theorem 3. Under Assumption 2, Assumption 1 holds ifand only if

P

(SX,τ + SY |X,τ

SY,τ + SX|Y,τ≤ 1

)−→n→∞

1,

with the scores SX,τ =∑ni=1 Sτ (qX,τ , Xi), SY |X,τ =∑n

i=1 Sτ (qY |X=Xi,τ , Yi) and similarly for Y and X | Y .

Note that Theorem 3 does not state that CLs and QSs areequivalent, but rather that inequalities in CLs imply inequal-ities in QSs with respect to a specific statistical model, andconversely. In other words, Theorem 3 implies that there isan equivalence between minimizing code length and quan-tile score. Hence, because of the MDL principle, the causaldirection can be inferred from the lowest quantile score.The proof can be found in the supplementary material (Sec-tion A.2), but the intuition is as follows.Because of the third bullet point in Assumption 2, QSs,namely the CLs of the residuals, asymptotically dominatethe CLs of the models. As a result, using QSs correspondingto consistent models is sufficient for causal discovery.Thanks to the stability (or invariance) of the true causalmodel, we expect Assumption 1 to hold over different quan-tile levels. However, since a single quantile is generally notenough to characterize a distribution, we further consider

SX =

∫[0,1]

SX,τdτ, (7)

and similarly for X | Y , Y and X|Y . By pooling resultsat different quantile levels, we aim at better describing themarginal and conditional distributions. Arguing that estimat-ing high and low (conditional) quantiles is hard, we coulduse only quantiles close to the median, that is integratingbetween over [0.4, 0.6] instead of [0, 1]). We empiricallyfound that this was more error prone when the generativemodels have asymmetries or multiplicative noises. Finally,we use averaging through integration rather than the max-imal QS difference over quantile levels because the scaleof QS is not uniform (e.g., the closer to 0.5 the higher).Hence, using maximization would essentially mean basingthe decision on the median only, whereas properly capturingthe spread of the data is also important.Decision rule 1 (Bivariate Quantile Causal Discovery). LetSX→Y = SX + SY |X and SY→X = SY + SX|Y IfSX→Y < SY→X , conclude that X causes Y . If SX→Y >SY→X , conclude that Y causes X . Otherwise, do not de-cide.

2.2. IntuitionThe mean and squared loss duality leads to using the MSEas a metric for mean regression even for non-Gaussian data.And the pinball loss plays a similar role in quantile regres-sion. In the MDL paradigm, the optimal encoding uses thedata’s true distribution, which is usually unkown for theresiduals of either a mean or quantile regression. Withoutknowing this optimal encoding, it is sensible to use a distri-bution related to the loss that is minimized. A Gaussian or

Page 6: Distinguishing Cause from Effect Using Quantiles ... · Distinguishing Cause from Effect Using Quantiles: Bivariate Quantile Causal Discovery Natasa Tagasovska1 2 Valerie Chavez-Demoulin´

Bivariate Quantile Causal Discovery

asymmetric Laplace (AL) encoding should thus not be seenas optimal, but as a “default” related to the specific meanor quantile regression problem. Furthermore, a Gaussianencoding is unlikely to be appropriate for quantile residuals,which are often asymmetrically distributed.To make this idea more precise, consider (1), albeit fromthe point of view of the MDL principle. One seeks a modelthat allows the best compression of the data, measured incode length (CL). Using a two-part scheme as in (2), thetotal CL is the sum of the conditional mean’s CL and that ofthe residuals, which is equal to the negative log-likelihoodof the distribution used to encode them. And encoding theresiduals using the normal distribution is natural in the samesense as using the mean squared error as a regression metric.The reason is that minimizing the squared loss is equivalentto maximizing the Gaussian log-likelihood, thus minimizingthe residuals’ CL.A similar reasoning can be applied to quantile regression,where encoding the residuals using an AL distribution(Koenker & Machado, 1999) represents a sensible default.This is because minimizing the quantile score (QS) is equiv-alent to maximizing the AL log-likelihood (an(τ) − QS),thus minimizing the residuals’ CL. Intuitively, the likelihoodcorresponding to (conditional) residuals in the causal direc-tion is higher, that is the QS and CL are smaller: the shortestCL corresponds to the largest AL likelihood/smallest QS,which establishes a link between minimizing QS and theMDL principle.Let us now illustrate the key ideas in the theoretical justifi-cation of bQCD. In Figure 3, we revisit the toy examplesfrom Figure 1, namely, the different setups: additive, mul-tiplicative and location-scale causal pairs. We disentanglethe CL’s different components for both the causal X → Yand anti-causal Y → X direction, for two quantile levelsτ ∈ {0.5, 0.8} and per generative model.Curves are obtained by averaging over 100 repetitions.As described in Section 2.1, causal discovery by MDL in-volves four summands for each possible direction. Forinstance, for X → Y , we have

• two for the marginal and conditional model parameters,CL(qX,τ ) and CLτ (qY |X,τ ),

• and two for the marginal and conditional residuals,CL(EX | qX,τ , τ) and CL(EX|Y | qX|Y,τ , τ).

In Figure 3, we use Model for the former and Residualsfor latter and sum them by type. The sum of both modeland residuals CLs is given by full lines. Unconditional andconditional quantiles are estimated respectively using theempirical quantile and kernel quantile regression. A uniformencoding is assumed in both cases, resulting in respectivemodel CLs of 1/2 log(n) and n1/3/2 log(n).In the top four panels of Figure 3, it is clear that the CLs

tau = 0.5 tau = 0.8

X −

> Y

Y −

> X

(X −

> Y

) / (Y −

> X

)

100 1000 10000 100 1000 10000

0

2000

4000

6000

8000

0

2000

4000

6000

8000

0.85

0.90

0.95

1.00

1.05

n

CL

Type Both Model Residuals

Model Additive Multiplicative Location−Scale

Figure 3. Verifying Assumption 1 for the setups of Figure 1. TheCLs of the residuals asymptotically dominate the CLs of themarginal and conditional models. The multiplicative model isnot identifiable using the median only. The correct causal directionhas a lower CL.

of the residuals (i.e., the QSs) dominate the CLs of themarginal and conditional models. This illustrates the thirdbullet point in Assumption 2.In the bottom two panels of Figure 3, we see that the ratioof causal to anti-causal CLs generally1 converges to someconstant smaller than 1, which illustrates Assumption 1. Italso means that Decision rule 1 would lead to the correctcausal direction. The multiplicative noise mechanism hasan interesting behaviour: considering the median only (i.e.,τ = 0.5), the ratio converges slowly to one, and therefore,the causal direction is not identifiable. However given thatthe ratio converges to some constant smaller than one forτ = 0.8, pooling the decision over multiple quantiles asin Decision rule 1 would again lead to the correct causaldirection.

3. Experiments3.1. bQCD implementationQuantile regression Theorem 3 holds provided that themodel is consistent and its complexity does not grow toofast. As such, Decision rule 1 can be implemented using anyquantile regression approach that satisfies Assumption 2. Inour experiments, we use three methods, namely nonparamet-

1Because we assumed the same uniform encoding for bothdirections, the ratio of CLs of models is always equal to one.

Page 7: Distinguishing Cause from Effect Using Quantiles ... · Distinguishing Cause from Effect Using Quantiles: Bivariate Quantile Causal Discovery Natasa Tagasovska1 2 Valerie Chavez-Demoulin´

Bivariate Quantile Causal Discovery

ric copulas (Geenens et al., 2017), quantile forests (Mein-shausen, 2006), quantile neural networks (Cannon, 2018),and show that they yield qualitatively and quantitativelysimilar results. We refer to Appendix B for more details onthe regression methods and their specific implementations.For estimating the overall quantile scores (aggregatingmultiple quantile levels), we use Legendre quadratureto approximate the integral over [0, 1], as it is fast andprecise for univariate functions. In other words, denot-ing by {wj , τj}mj=1 the m pairs of quadrature weights

and nodes, we use∫ 1

0g(τ)dτ ≈

∑mj=1 wjg(τj), which

when plugged into (2.1) yields SX =∑mj=1 wjSX(τj),

SY =∑mj=1 wjSY (τj), SX|Y =

∑mj=1 wjSX|Y (τj), and

SY |X =∑mj=1 wjSY |X(τj). Summing over an equally

spaced grid with uniform weights or using quadrature nodesand weights yields two valid approximations of an integral,and using one or the other should not matter. But the quadra-ture gives more importance to the center of the distribution(i.e., quantiles closer to 0.5 have a higher weight). Notethat, to compute scores free of scale bias, the variables aretransformed to the standard normal scale.Computational complexity bQCD scales linearly with thesize of input data, O(n) for copulas, O(ntree × nattr ×depth×n) for random forests andO(epochs×nweights×n)corresponding to the choice of regressor. However, usingmultiple quantile levels does not increase the complexitysignificantly since all of the suggested implementations al-low for simultaneous conditional quantile estimation, thatis, we estimate a single model at each possible causal direc-tion 2, from which we compute all of the requested quantiles.Roughly speaking, since n >> m, the overall complexityscales with O(n). As such, bQCD compares favorably tononparametric methods relying on computationally inten-sive procedures, for an instance based on kernels (Chenet al., 2014; Hernandez-Lobato et al., 2016) or Gaussianprocesses (Hoyer et al., 2009; Mooij et al., 2010; Sgouritsaet al., 2015).The parameter m can be used to control for the trade-offbetween the computational complexity and the precisionof the estimation. We recommend the value m = 3 which,makes it possible to capture variability in both location andscale. Setting m = 1 is essentially equivalent to usingonly the conditional median for causal discovery, a settingthat suitable for distributions with constant variance. Anempirical analysis of the choice of m is provided in thefollowing section. In what follows, we report results forbQCD with m = 3 if not stated otherwise.

2Except for bQCD, where the nature of copula models allowsfor joint estimation.

3.2. Datasets, baselines and metricsBenchmarks For simulated data, we first rely on the fol-lowing scenarios (Mooij et al., 2016): SIM (without con-founder), SIM-ln (with low noise), SIM-G (with distribu-tions close to Gaussian), and SIM-c (with latent confounder).There are 100 pairs of size n = 1000 in each of thesedatasets.As a second benchmark, inspired by (Peters et al., 2014),we generate a diverse dataset of additive, location-scaleand multiplicative causal pairs. We include nonlinear ad-ditive noise (AN) models of the form Y = f(X) + EYfor some deterministic function f with EY ∼ N (0, σ),X ∼ N (0,

√2), and σ ∼ U [1/5,

√2/5]. In AN, f is an

arbitrary nonlinear function simulated using Gaussian pro-cesses (GP, Rasmussen & Williams, 2006) with a Gaussiankernel of bandwidth one. Since the functions in AN areoften non-injective, we include AN-s to explore the behaviorof bQCD in injective cases. In this setup, f are sigmoids asin (Buhlmann et al., 2014). The third experiment considerslocation-scale (LS) data generating processes with both themean and variance of the effect being functions of the cause,that is Y = f(X) + g(X)EY , and EY and X are similaras for the additive noise models. LS and LS-s then corre-spond to the Gaussian processes and sigmoids describedfor AN and AN-s. Finally, the fourth experiment considersmultiplicative models (MN) as Y = f(X)EY , with f(X)sampled as sigmoid functions and EY ∼ U(0, 1). In eachof the second, third, and fourth experiments, we simulate100 pairs of size n = 1000. All pairs have equal weightswith variable ordering according to a coin flip, thereforeresulting in balanced datasets. Example datasets for each ofthe simulated experiments are shown in the supplementarymaterial (Appendix F).For real data, we use the Tubingen CE benchmark (ver-sion Dec 2017), consisting of 108 pairs from 37 differentdomains, from which we consider only the 99 pairs thathave univariate continuous or discrete cause and effect vari-ables. When evaluating the performance on this dataset weincluded the pairs’ corresponding weights which accountsfor potential bias in cases where pairs were selected fromsame multivariable dataset.Baselines On simulated data, we compare bQCD to state-of the-art approaches, namely RESIT (Peters et al., 2014),biCAM (Buhlmann et al., 2014), LinGaM (Shimizu et al.,2006), and GR-AN (Hernandez-Lobato et al., 2016), whichare ANM-based, and IGCI (Janzing & Scholkopf, 2010),EMD (Chen et al., 2014), RECI (Blobaum et al., 2018),Slope (Marx & Vreeken, 2017) and Sloppy (Marx &Vreeken, 2019), based on the independence postulate. Wealso consider other methods such as PNL-MLP (Zhang &Hyvarinen, 2009), GPI (Mooij et al., 2010), ANM (Hoyeret al., 2009), and CURE (Sgouritsa et al., 2015). All base-lines distinguish between cause and effect solely in the

Page 8: Distinguishing Cause from Effect Using Quantiles ... · Distinguishing Cause from Effect Using Quantiles: Bivariate Quantile Causal Discovery Natasa Tagasovska1 2 Valerie Chavez-Demoulin´

Bivariate Quantile Causal Discovery

QCD, m=3

0

25 (%)

50 (%)

75 (%)

100 (%)

SIM

SIM−c

SIM −ln

SIM −G

AN

AN −s

LS

LS −s

MN −U

Tueb

CopulaRand ForestNN

Figure 4. bQCD achieves consistent results across all benchmarksregardless of the practical implementation.

bivariate case, except for CAM, RESIT and LinGaM whoseimpementations allow for higher dimensional causal dis-covery. Implementation details and hyper parameters forall baselines are described in the supplementary material(Section C.1).Our code and datasets are available in the submitted supple-mentary package and https://github.com/tagas/bQCD.Evaluation metrics As (Mooij et al., 2016), we use theaccuracy for forced decisions and the area under the re-ceiver operating curve (ROC) for ranked decisions. Theformer corresponds to forcing the compared methods todecide the causal direction. The later corresponds to usingheuristic scores allowing to rank confidence in the direc-tions along with ROC/AUC as performance measure. Asa confidence heuristic for the ranked decisions for bQCD,we use same score as (24) in (Mooij et al., 2016), that isC = −SX→Y + SY→X , with higher absolute values corre-sponding to higher confidence. This score can also be pairedwith the null-hypercompressibility inequality (Grunwald &Grunwald, 2007) to derive a significance test and abstainfrom making insignificant inferences.

3.3. Results and discussionbQCD robustness wrt to its implementation From Fig-ure 4 it is clear that the choice of implementation in estimat-ing the conditional quantiles has no significant impact andbQCD provides consistent results across all benchmarks.Further results that confirm the robustness to implementa-tion for different values of m are shown in Appendix B,Figure 8. In the remaining of the paper we will show thecopula-based results.Selection of m In an ablation study we explored the signifi-cance of the parameter m with regards to different samplesizes. Looking at the figures in Figure 5, we can clearly seein which cases multiple quantile levels can indeed increasethe accuracy, namely for sigmoid (i.e., harder to detect)causal mechanisms. For confounded data, increasing mseems to help too, albeit faintlier. Additionally, we can no-tice that higher values of m have more pronounced effect

as the sample size increases. This was used to improve ourresults on real data, namely by setting the parameter m to 1when n < 200 and m = 3 for the rest.

●●

●●

0.5

0.6

0.7

0.8

250 500 750sample size

accu

racy

m ● 1 3 7 15

AN−sa

● ●

0.6

0.7

250 500 750sample size

accu

racy

m ● 1 3 7 15

SIM−ccb

Figure 5. The accuracy increases with m and the sample size.

Comparison to baselines In Figure 6, we compare causaldiscovery algorithms across simulated datasets with regardsto accuracy. Tabulated numbers are in Section D.1.

QCD

0

25 (%)50 (%)75 (%)100 (%)

SIM

SIM−c

SIM −ln

SIM −G

AN −G AN

−s

LS

LS −s

MN −U

Tueb

●●

●●

m = 1m = 3m = 7

biCAM

0

25 (%)50 (%)75 (%)100 (%)

SIMSIM−c

SIM −ln

SIM −G

AN −G AN

−s

LS

LS −s

MN −U

Tueb

IGCI (u/g)

0

25 (%)50 (%)75 (%)100 (%)

SIMSIM−c

SIM −ln

SIM −G

AN −G AN

−s

LS

LS −s

MN −U

Tueb

IGCI−gIGCI−u

Slope

0

25 (%)50 (%)75 (%)100 (%)

SIMSIM−c

SIM −ln

SIM −G

AN −G AN

−s

LS

LS −s

MN −U

Tueb

RESIT/ANM

0

25 (%)50 (%)75 (%)100 (%)

SIMSIM−c

SIM −ln

SIM −G

AN −G AN

−s

LS

LS −s

MN −U

Tueb

RESITANM

LINGAM

0

25 (%)50 (%)75 (%)100 (%)

SIMSIM−c

SIM −ln

SIM −G

AN −G AN

−s

LS

LS −s

MN −U

Tueb

EMD

0

25 (%)50 (%)75 (%)100 (%)

SIMSIM−c

SIM −ln

SIM −G

AN −G AN

−s

LS

LS −s

MN −U

Tueb

GR−AN

0

25 (%)50 (%)75 (%)100 (%)

SIMSIM−c

SIM −ln

SIM −G

AN −G AN

−s

LS

LS −s

MN −U

Tueb

GPI

0

25 (%)50 (%)75 (%)100 (%)

SIMSIM−c

SIM −ln

SIM −G

AN −G AN

−s

LS

LS −s

MN −U

Tueb

PNL

0

25 (%)50 (%)75 (%)100 (%)

SIMSIM−c

SIM −ln

SIM −G

AN −G AN

−s

LS

LS −s

MN −U

Tueb

Sloppy

0

25 (%)50 (%)75 (%)100 (%)

SIMSIM−c

SIM −ln

SIM −G

AN −G AN

−s

LS

LS −s

MN −U

Tueb

RECI−log

0

25 (%)50 (%)75 (%)100 (%)

SIMSIM−c

SIM −ln

SIM −G

AN −G AN

−s

LS

LS −s

MN −U

Tueb

w/o preprw prepr

Figure 6. Accuracy of bQCD and competitors.There is no single baseline is an overall best performer,however, we notice that bQCD has most consistent resultsacross all benchmarks. Starting with the SIM benchmarks,we notice that GPI achieves highest accuracy in all fourscenarios, followed with similar results by RESIT/ANM 3.On this benchmark, bQCD behaves similarly to the rest of

3Because RESIT is an R version based on the MATLAB ANM,we overlay their results on a single chart.

Page 9: Distinguishing Cause from Effect Using Quantiles ... · Distinguishing Cause from Effect Using Quantiles: Bivariate Quantile Causal Discovery Natasa Tagasovska1 2 Valerie Chavez-Demoulin´

Bivariate Quantile Causal Discovery

the baselines, while being more robust in the confoundedscenario where others achieve results on the scale of ran-dom guess. Interestingly, higher values of the parameter mimprove the results in such pairs.The results are significantly different for the AN, LS andMN scenarios. biCAM, ANM and RESIT easily handle theAN pairs since their underlying assumptions are met, whilewe can notice some discrepancy in the LS and MN scenarioswhere there is an interaction between the noise and the cause.Similarly, LINGAM does not perform well on any of thedatasets, which are all highly nonlinear, hence violatingits assumptions. IGCI can handle any scenario with thegaussian reference measure, while this is not the case withthe uniform measure4. In the LS generative models wherenot only the mean, but the variance of the effect changeswith the cause only IGCI-g was on-par with bQCD, butbQCD is still better than IGCI-g on the SIM benchmark andreal data pairs. On the other hand, more flexible methodssuch as PNL and EMD had difficulties in the non-injectivecases. bQCD has satisfactory (> 75% accuracy) for alldifferent data generative mechanisms (AN, LS and MN).From the baselines presented, RECI’s model allows fordependence between the noise and the cause. Although itshows good results in practice, it is important to note thatthe outcome depends significantly on the preprocessing stepas well as the selection of the regressor, for which still, thereis no clear guidance. For more details on the variationsof the proposed solution, we refer to the original paper(Blobaum et al., 2018). On the contrary, bQCD achieves thesame results no matter the implementation, which makes itstraightforward to use.With real data pairs, Figure 3.3 shows that bQCD5 is highlycompetitive in terms of weighted accuracy, with only Slopeachieving better overall results. However, note that Slopedoes well only on this dataset and performs poorly on syn-thetic benchmarks, while bQCD performs well under diversesetups. Additionally, we include accuracy decision-rate plotin Figure 7. Note that bQCD provides statistically signifi-cant results (i.e., compared to a coin flip) and is second outof the 6 best performing algorithms with respect to weightedaccuracy. In Appendix D, Figure 10, we further provide ac-curacy decision rate plot and ROC curves with all baselines.Moreover, the efficiency of our method is highlighted in thelast row of Figure 3.3, where bQCD is able to go over thewhole dataset in ∼ 7 minutes. As for other nonparametricmethods, only IGCI and Sloppy are faster, Slope is twiceas slow, RESIT 55 times, PNL 71 times, and the othersrequired days to go through the whole dataset of had to be

4Selecting the reference measure is the method’s most sensitivepart and is as difficult as selecting the right kernel/bandwidth for aspecific task (Janzing et al., 2012; Mooij et al., 2016).

5Results are averaged over 30 repetitions to account for theeffect of the jittering in the discrete pairs.

Table 1. Results for the Tubingen Benchmark.bQCD IGCI-u/g biCAM Slope LINGAM RESIT Sloppy

Acc 0.68 0.67/0.61 0.57 0.75 0.3 0.53 0.59Weig Acc 0.75 0.72/0.62 0.58 0.83 0.36 0.63 0.7AUC ROC 0.71 0.67 0.61 0.84 0.3 0.56 0.67CPU 7 min. 2 sec. 10 sec. 25 min. 3.5 sec. 12 h 1.3 min

EMD GRAN GPI PNL-MLP ANM CURE RECIAcc 0.55 0.4 0.6 0.75 0.6 0.6 0.63Weig Acc 0.6 0.5 0.63 0.73 0.6 0.54 0.7AUC ROC 0.53 0.47 0.61 0.7 0.45 0.61 0.68CPU 4.6 days NA 30 days 8.3 h 3.2 days NA 1.2h

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

decision rate

% o

f cor

rect

dec

isio

ns

c

QCDRESITIGCI−u

PNLEMDSlope

Figure 7. Accuracy decision-rates for the top baselines on theTubingen Benchmark.

averaged on subsamples due to slow execution (GRAN).Overall, we can conclude that compared to baselines bQCDperforms well in both real and simulated scenarios thereforebeing more robust to different generative models while alsohaving computational advantages.

4. ConclusionIn this work, we develop a causal discovery method basedon conditional quantiles. We propose bQCD, an effectiveimplementation of our method based on nonparametric quan-tile regression, compares favorably to state-of-the-art meth-ods on both simulated and real datasets. The new methodshould be preferred mainly because of three reasons. First,quantiles are less sensitive to outliers than the mean, thusbQCD is robust to contamination or heavy-tails, hence, itdoes not require preprocesing (“cleaning”) step. Second,we make no assumption on the parametric class considered,thus allowing for a wide range of mechanisms, whereasbaselines perform worse when their assumptions are not sat-isfied. Third, our implementation is computationally moreefficient than other nonparametric methods. There are cur-rently three directions that we are explorig to extend thiswork. First, bQCD can be extended to multivariate cause-effect pairs. Second, it is interesting to see how quantilescores can fit in the invariant causal prediction framework(Peters et al., 2016). Third, the computational efficiency ofbQCD is promising in the context of extensions to higherdimensional datasets. As such, ongoing research leveragesexisting graph discovery algorithms for hybrid learning, assuggested in the supplementary material. Fourth, as dis-cussed in Remark 2, finding structural or distributional as-sumptions for which Assumption 1 or (11) holds remainsan open problem.

Page 10: Distinguishing Cause from Effect Using Quantiles ... · Distinguishing Cause from Effect Using Quantiles: Bivariate Quantile Causal Discovery Natasa Tagasovska1 2 Valerie Chavez-Demoulin´

Bivariate Quantile Causal Discovery

ReferencesAn Introduction to Causal Inference. The International

Journal of Biostatistics, 2010.

Graph estimation with joint additive models. Biometrika,2014.

Structural Intervention Distance for Evaluating CausalGraphs. Neural Computation, 27(3):771–799, mar 2015.

Aas, K., Czado, C., Frigessi, A., and Bakken, H. Pair-copula constructions of multiple dependence. Insurance:Mathematics and economics, 44(2):182–198, 2009.

Aue, A., Cheung, R. C. Y., Lee, T. C. M., and Zhong, M.Segmented model selection in quantile regression usingthe minimum description length principle. Journal ofthe American Statistical Association, 109(109):507–1241,2014.

Bauer, A. and Czado, C. Pair-copula Bayesian networks.Journal of Computational and Graphical Statistics, 25(4):1248–1271, 2016a.

Bauer, A. and Czado, C. Pair-copula Bayesian networks.Journal of Computational and Graphical Statistics, 25(4):1248–1271, 2016b.

Bauer, A., Czado, C., and Klein, T. Pair-copula construc-tions for non-Gaussian DAG models. The CanadianJournal of Statistics, 40(1):86–109, 2012.

Bedford, T., Cooke, R. M., et al. Vines–a new graphicalmodel for dependent random variables. The Annals ofStatistics, 30(4):1031–1068, 2002.

Besag, J. Spatial Interaction and the Statistical Analysis ofLattice Systems. Journal of the Royal Statistical Society.Series B (Methodological), 36(2):192–236, 1974.

Blobaum, P., Janzing, D., Washio, T., Shimizu, S., andScholkopf, B. Cause-effect inference by comparing re-gression errors. In International Conference on ArtificialIntelligence and Statistics, pp. 900–909, 2018.

Breiman, L. Random forests. Machine learning, 45(1):5–32, 2001.

Budhathoki, K. and Vreeken, J. MDL for Causal Inferenceon Discrete Data.

Budhathoki, K. and Vreeken, J. Causal inference by com-pression. In ICDM, pp. 41–50, 2017.

Buhlmann, P., Peters, J., and Ernest, J. CAM: Causal addi-tive models, high-dimensional order search and penalizedregression. Annals of Statistics, 42(6):2526–2556, 2014.

Cannon, A. J. Non-crossing nonlinear regression quantilesby monotone composite quantile regression neural net-work, with application to rainfall extremes. StochasticEnvironmental Research and Risk Assessment, pp. 3207–3225, 2018. ISSN 1436-3259.

Chalupka, K., Eberhardt, F., and Perona, P. Causal featurelearning: an overview. Behaviormetrika, 44(1):137–164,2017.

Chang, Y., Li, Y., Ding, A., and Dy, J. A robust-equitablecopula dependence measure for feature selection. AIS-TATS, 41:84–92, 2016.

Chen, Z., Zhang, K., Chan, L., and Scholkopf, B. Causal dis-covery via reproducing kernel hilbert space embeddings.Neural Computation, 26(7):1484–1517, 2014.

Chickering, D. M. Optimal Structure Identification WithGreedy Search. Journal of Machine Learning Research,3:507–554, 2002.

Cui, R., Groot, P., and Heskes, T. Copula PC algorithm forcausal discovery from mixed data. In Lecture Notes inComputer Science (including subseries Lecture Notes inArtificial Intelligence and Lecture Notes in Bioinformat-ics), volume 9852 LNAI, pp. 337–392. Springer, Cham,sep 2016.

Dawid, C. A. et al. Fundamentals of statistical causality.2007.

Drton, M. and Maathuis, M. H. Structure Learning in Graph-ical Modeling. Annual Review of Statistics and Its Appli-cation, 4:365–393, 2017.

Elidan, G. Copula Bayesian Networks. In NIPS 23, pp.559–567, 2010.

Elidan, G. Copulas in machine learning. In Copulae in math-ematical and quantitative finance, pp. 39–60. Springer,2013.

Embrechts, P., Mcneil, E., and Straumann, D. Correlation:Pitfalls and alternatives. Risk Magazine, 1999.

Ernest, J. Causal inference in semiparametric and nonpara-metric structural equation models ETH Library. PhDthesis, 2016.

Fisher, R. A. Statistical Methods for Research Workers.

Fisher, R. A. Statistical methods for research workers. Es-pecially Section, 21, 1936.

Flaxman, S. R., Neill, D. B., and Smola, A. J. GaussianProcesses for Independence Tests with Non-iid Data inCausal Inference. ACM Transactions on Intelligent Sys-tems and Technology (TIST), 7(2):21–22, 2016.

Page 11: Distinguishing Cause from Effect Using Quantiles ... · Distinguishing Cause from Effect Using Quantiles: Bivariate Quantile Causal Discovery Natasa Tagasovska1 2 Valerie Chavez-Demoulin´

Bivariate Quantile Causal Discovery

Freedman, D. and Diaconis, P. On the histogram as a densityestimator:l2 theory. Zeitschrift fur Wahrscheinlichkeit-stheorie und Verwandte Gebiete, 57(4):453–476, Dec1981. ISSN 1432-2064. doi: 10.1007/BF01025868. URLhttps://doi.org/10.1007/BF01025868.

Geenens, G., Charpentier, A., and Paindaveine, D. Probittransformation for nonparametric kernel estimation of thecopula density. Bernoulli, 23(3):1848–1873, 2017.

Geraci, M. and Bottai, M. Quantile regression for longi-tudinal data using the asymmetric laplace distribution.Biostatistics, 8(1):140–154, 2007.

Gneiting, T. Making and evaluating point forecasts. Journalof the American Statistical Association, 106(494):746–762, jun 2011.

Goudet, O., Kalainathan, D., Caillou, P., Guyon, I., Lopez-Paz, D., and Sebag, M. Causal Generative Neural Net-works. 2017a.

Goudet, O., Kalainathan, D., Caillou, P., Lopez-Paz, D.,Guyon, I., Sebag, M., Tritas, A., and Tubaro, P. Learningfunctional causal models with generative neural networks.preprint, nov 2017b. arXiv: 1709.05321.

Grunwald, P. D. and Grunwald, A. The minimum descriptionlength principle. MIT press, 2007.

Guennebaud, G., Jacob, B., and Others. Eigen v3, 2010.URL http://eigen.tuxfamily.org.

Hall, P. and Hannan, E. J. On stochastic complexity andnonparametric density estimation. Biometrika, 75(4):705–714, 1988.

Hansen, M. H. and Yu, B. Model selection and the principleof minimum description length. Journal of the AmericanStatistical Association, 96(454):746–774, 2001.

Harrell Jr, F. E., with contributions from Charles Dupont,and others., M. Hmisc: Harrell Miscellaneous,2017. URL https://cran.r-project.org/package=Hmisc.

Harris, N. and Drton, M. PC Algorithm for nonparanormalgraphical models. Journal of Machine Learning Research,14:3365–3383, 2013.

Heinze-Deml, C., Peters, J., and Meinshausen, N. Invariantcausal prediction for nonlinear models. Journal of CausalInference, 6(2), 2018.

Hernandez-Lobato, D., Morales Mombiela, P., Lopez-Paz,D., and Suarez, A. Non-linear causal inference usingGaussianity measures. Journal of Machine LearningResearch, 17(1):939–977, 2016.

Hernandez-Orallo, J. and Dowe, D. L. Measuring univer-sal intelligence: Towards an anytime intelligence test.Artificial Intelligence, 174(18):1508–1539, 2010.

Hoeffding, W. A Non-Parametric Test of Independence. TheAnnals of Mathematical Statistics, 19(4):546–557, 1948.

Hoyer, P. O., Janzing, D., Mooij, J., Peters, J., andScholkopf, B. Nonlinear causal discovery with additivenoise models. In NIPS 22, pp. 689–696, 2009.

Hyvarinen , A. and Smith, S. M. Pairwise Likelihood Ratiosfor Estimation of Non-Gaussian Structural Equation Mod-els. Journal of Machine Learning Research, 14:111–152,2013.

Janzing, D. and Scholkopf, B. Causal Inference using thealgorithmic markov condition. IEEE Transactions onInformation Theory, 56(10):5168–5194, 2010.

Janzing, D., Peters, J., Mooij, J., and Scholkopf, B. Identi-fying confounders using additive noise models.

Janzing, D., Mooij, J., Zhang, K., Lemeire, J., Zscheis-chler, J., Daniusis, P., Steudel, B., and Scholkopf, B.Information-geometric approach to inferring causal direc-tions. Artificial Intelligence, 182:1–31, 2012.

Joe, H. Families of m-variate distributions with given mar-gins and m(m-1)/2 bivariate dependence parameters. InDistributions with fixed marginals and related topics, pp.120–141. Institute of Mathematical Statistics, 1996.

Karra, K. and Mili, L. Hybrid Copula Bayesian Networks.In Conference on Probabilistic Graphical Models, vol-ume 52, pp. 240–251, 2016.

Kocaoglu, M., Dimakis, A. G., Vishwanath, S., and Has-sibi, B. Entropic causal inference. In Thirty-First AAAIConference on Artificial Intelligence, 2017.

Koenker, R. and Machado, J. A. Goodness of fit and relatedinference processes for quantile regression. Journal ofthe american statistical association, 94(448):1296–1310,1999.

Koenker, Roger . Quantile regression. Econometric Societymonographs, 2005.

Kolmogorov, A. N. On tables of random numbers. Sankhya:The Indian Journal of Statistics, Series A, pp. 369–376,1963.

Kpotufe, S., Sgouritsa, E., Janzing, D., and Scholkopf, B.Consistency of causal inference under the additive noisemodel. In ICML 31, pp. 478–486, 2014.

Lichman, M. {UCI}Machine Learning Repository, 2013.URL http://archive.ics.uci.edu/ml.

Page 12: Distinguishing Cause from Effect Using Quantiles ... · Distinguishing Cause from Effect Using Quantiles: Bivariate Quantile Causal Discovery Natasa Tagasovska1 2 Valerie Chavez-Demoulin´

Bivariate Quantile Causal Discovery

Liu, F. and Chan, L.-W. Causal inference on multidimen-sional data using free probability theory. IEEE Transac-tions on Neural Networks and Learning Systems, 2017.

Liu, H., Lafferty, J., and Wasserman, L. The Nonparanor-mal: semiparametric estimation of high dimensional undi-rected graphs. Journal of Machine Learning Research,10:2295–2328, 2009.

Loader, C. Local regression and likelihood. Springer Sci-ence & Business Media, 2006.

Lopez-Paz, D. From Dependence to Causation. PhD thesis,University of Cambridge, 2016.

Lopez-Paz, D., Hernandez-Lobato, J. M., and Scholkopf, B.Semi-supervised domain adaptation with copulas. NIPS26, pp. 674–682, 2013.

Lopez-Paz, D., Muandet, K., Scholkopf, B., and Tolstikhin,I. Towards a learning theory of cause-effect inference. InICML 32, pp. 1452–1461, 2015.

Maathuis, M. H. and Nandy, P. A review of some recentadvances in causal inference. In Handbook of Big Data.CRC Press, 2016.

Mandros, P., Boley, M., and Vreeken, J. Discovering reliableapproximate functional dependencies. KDD, pp. 355–364,2017.

Marx, A. and Vreeken, J. Telling cause from effect usingMDL-based local and global regression. In ICDM, 2017.

Marx, A. and Vreeken, J. Identifiability of cause and effectusing regularized regression. In ACM SIGKDD, 2019.

Meinshausen, N. Quantile regression forests. JMLR, 2006.

Mitrovic, J., Sejdinovic, D., and Teh, Y. W. Causal inferencevia kernel deviance measures. In Advances in NeuralInformation Processing Systems, pp. 6986–6994, 2018.

Mooij, J., Janzing, D., Peters, J., and Scholkopf, B. Regres-sion by dependence minimization and its application tocausal inference in additive noise models. In ICML 26,pp. 745–752, 2009.

Mooij, J. M., Stegle, O., Janzing, D., Zhang, K., andScholkopf, B. Probabilistic latent variable models fordistinguishing between cause and effect. In NIPS 23, pp.1687–1695, 2010.

Mooij, J. M., Peters, J., Janzing, D., Zscheischler, J., andScholkopf, B. Distinguishing cause from effect usingobservational data: methods and benchmarks. Journal ofMachine Learning Research, 17:1–102, 2016.

Muller, D. and Czado, C. Selection of sparse vine copulasin high dimensions with the lasso. may 2017.

Nagler, T. and Vatter, T. vinecopulib: High Perfor-mance Algorithms for Vine Copula Modeling in C++.http://vinecopulib.org, 2017.

Nagler, T. and Vatter, T. rvinecopulib: highperformance algorithms for vine copula modeling,2018. URL https://cran.r-project.org/package=rvinecopulib.

Oates, C. J., Smith, J. Q., and Mukherjee, S. EstimatingCausal Structure Using Conditional DAG Models. Jour-nal of Machine Learning Research, 17:1–23, 2016a.

Oates, C. J., Smith, J. Q., and Mukherjee, S. EstimatingCausal Structure Using Conditional DAG Models. Jour-nal of Machine Learning Research, 17:1–23, 2016b.

Pearl, J. Probabilistic Reasoning in Intelligent Systems:Networks of Plausible Inference. Morgan KaufmannSeries in Representation and Reasoning, 1988.

Pearl, J. Causality. Cambridge University Press, 2009.

Pearl, J., Glymour, M., and Jewell, N. P. Causal Inferencein Statistics: A Primer. John Wiley & Sons, 2016.

Peters, J. and Ernest, J. CAM: Causal Additive Model(CAM), 2015. URL https://cran.r-project.org/package=CAM.

Peters, J., Mooij, J. M., Janzing, D., and Scholkopf, B.Identifiability of causal graphs using functional models.In UAI 27, pp. 589–598, 2011.

Peters, J., Mooij, J. M., Janzing, D., and Scholkopf, B.Causal discovery with continuous additive noise models.Journal of Machine Learning Research, 15(1):2009–2053,2014.

Peters, J., Buhlmann, P., and Meinshausen, N. Causal in-ference by using invariant prediction: identification andconfidence intervals. Journal of the Royal Statistical So-ciety. Series B: Statistical Methodology, 78(5):947–1012,2016.

Peters, J., Janzing, D., and Scholkopf, B. Elements of CausalInference: Foundations and Learning Algorithms. MITPress (available on-line), 2017.

Pircalabelu, E., Claeskens, G., and Gijbels, I. Copula di-rected acyclic graphs. Statistics and Computing, 27(1):55–78, 2017.

R Core Team. R: A language and environment for statisticalcomputing, 2017. URL https://www.r-project.org/.

Rasmussen, C. E. and Williams, C. K. I. Gaussian Processesfor Machine Learning, volume 1. MIT Press Cambridge,2006.

Page 13: Distinguishing Cause from Effect Using Quantiles ... · Distinguishing Cause from Effect Using Quantiles: Bivariate Quantile Causal Discovery Natasa Tagasovska1 2 Valerie Chavez-Demoulin´

Bivariate Quantile Causal Discovery

Rissanen, J. Modeling by shortest data description. Auto-matica, 14(5):465–471, 1978.

Rissanen, J. A Universal Prior for Integers and Estimationby Minimum Description Length. The Annals of Statistics,11(2):416–431, 1983.

Rissanen, J. Stochastic complexity and modeling. Theannals of statistics, pp. 1080–1100, 1986.

Rissanen, J. Complexity and Information in Modeling. pp.1–16, 2005.

Rissanen, J., Speed, T. P., and Yu, B. Density estimation bystochastic complexity. IEEE Transactions on InformationTheory, 38(2):315–323, 1992.

Rojas-Carulla, M., Baroni, M., and Lopez-Paz, D. CausalDiscovery Using Proxy Variables. 2017.

Sachs, K., Perez, O., Pe’er, D., Lauffenburger, D. A., andNolan, G. P. Causal Protein-Signaling Networks Derivedfrom Multiparameter Single-Cell Data. Science, 308(5721):523–529, apr 2005.

Scaillet, O., Charpentier, A., and Fermanian, J.-D. Theestimation of copulas: Theory and practice. Technicalreport, Ensae-Crest and Katholieke Universiteit Leuven,NP-Paribas and Crest; HEC Genve and Swiss FinanceInstitute, 2007.

Schaling, B. The Boost C++ Libraries. 2011.

Scholkopf, B. Causality for machine learning. arXivpreprint arXiv:1911.10500, 2019.

Scholkopf, B., Janzing, D., Peters, J., Sgouritsa, E., Zhang,K., and Mooij, J. On causal and anticausal learning. InICML 29, pp. 1255–1262, 2012.

Scholkopf, B., Hogg, D. W., Wang, D., Foreman-Mackey,D., Janzing, D., Simon-Gabriel, C.-J., and Peters, J. Mod-eling confounding by half-sibling regression. Proceed-ings of the National Academy of Sciences, 2016.

Sgouritsa, E., Janzing, D., Hennig, P., and Scholkopf, B.Inference of cause and effect with unsupervised inverseregression. In AISTATS 38, pp. 847–855, 2015.

Shimizu, S., Hoyer, P., Hyvarinen , Aapo, and Antti, K. Alinear non-gaussian acyclic model for causal discovery.Journal of Machine Learning Research, 7:2003–2030,2006.

Sklar, A. Fonctions de repartition a n dimensions et leursmarges. Publications de L’Institut de Statistique deL’Universite de Paris, 8:229–231, 1959.

Smyth, G. K. Numerical integration. Encyclopedia ofBiostatistics, pp. 3088–3095, 2005.

Spirtes, P. and Glymour, C. An algorithm for fast recoveryof sparse causal graphs. Social science computer review,9(1):62–72, 1991.

Spirtes, P. and Zhang, K. Causal discovery and inference:Concepts and recent methodological advances. AppliedInformatics, pp. 165–191, 2016.

Spirtes, P., Meek, C., and Richardson, T. Causal Inferencein the Presence of Latent Variables and Selection Bias. InUAI 11, pp. 499—-506, 1995.

Spirtes, P., Glymour, C., and Scheines, R. Causation, Pre-diction, and Search. MIT press, 2000a.

Spirtes, P., Glymour, C., and Scheines, R. Causation, Pre-diction, and Search. MIT press, Causation2000, 2000b.

Tagasovska, N. and Lopez-Paz, D. Single-model uncertain-ties for deep learning. In NeurIPS, 2019.

Tagasovska, N., Ackerer, D., and Vatter, T. Copulas ashigh-dimensional generative models: Vine copula autoen-coders. In Advances in Neural Information ProcessingSystems, pp. 6525–6537, 2019.

Takeuchi, I., Bengio, Y., and Kanamori, T. Robust Regres-sion with Asymmetric Heavy-Tail Noise Distributions.Neural Computation, 14(10):2469–2496, oct 2002.

Tran, D., Blei, D. M., and Airoldi, E. M. Copula variationalinference. In NIPS 28, pp. 3564–3572, 2015.

Tsamardinos, I., Brown, L. E., and Aliferis, C. F. The max-min hill-climbing Bayesian network structure learningalgorithm. Machine Learning, 65(1):31–78, oct 2006.

Turing, A. M. On computable numbers, with an applicationto the Entscheidungsproblem. Proceedings of the LondonMathematical Society, 2(1):230–265, 1937.

Turing, A. M. On computable numbers, with an applicationto the Entscheidungsproblem. A correction. Proceedingsof the London Mathematical Society, 2(1):544–546, 1938.

Vatter, T. and Chavez-Demoulin, V. Generalized additivemodels for conditional dependence structures. Journal ofMultivariate Analysis, 2015.

Vereshchagin, N. and Vitanyi, P. Kolmogorov’s StructureFunctions and Model Selection. IEEE Transactions onInformation Theory, 50(12):3265–3290, dec 2004.

Vereshchagin, N. K. and Vitanyi, P. M. B. Kolmogorov ’sstructure functions and model selection. IEEE Transac-tions on Information Theory, 50(12):3265–3290, 2004.

Vreeken, J. Causal Inference by Directionof Information. SIAM, 2015. URL http://eda.mmci.uni-saarland.de/pubs/2015/ergo-vreeken.pdf.

Page 14: Distinguishing Cause from Effect Using Quantiles ... · Distinguishing Cause from Effect Using Quantiles: Bivariate Quantile Causal Discovery Natasa Tagasovska1 2 Valerie Chavez-Demoulin´

Bivariate Quantile Causal Discovery

Wasserman, L. All of nonparametric statistics. SpringerScience & Business Media, 2006.

Wieczorek, A., Wieser, M., Murezzan, D., and Roth, V.Learning sparse latent representations with the deep cop-ula information bottleneck. ICLR, 2018.

Yu, H. and Dauwels, J. Modeling Spatio-Temporal ExtremeEvents Using Graphical Models. IEEE Transactions onSignal Processing, 64(1), 2016.

Yu, K., Lu, Z., and Stander, J. Quantile regression: appli-cations and current research areas. Journal of the RoyalStatistical Society: Series D (The Statistician), 52(3):331–350, 2003.

Zhang, K. and Hyvarinen, A. On the identifiability of thepost-nonlinear causal model. In UAI 25, pp. 647–655,2009.

Zhang, K., Muandet, K., Wang, Z., and Others. Domainadaptation under target and conditional shift. ICML 30,pp. 819–827, 2013.

Zhang, K., Scholkopf, B., Spirtes, P., and Glymour, C.Special Topic: Machine Learning Learning Causalityand Causality-Related Learning: Some Recent ProgressLearning Causal Relations. National Science Review, pp.nwx137, 2017.

Zhang, Kun and Wang, Zhikun and Zhang, Jiji andSch{\”o}lkopf, B. On Estimation of Functional CausalModels: General Results and Application to Post-Nonlinear Causal Model. ACM Transactions on Intel-ligent Systems and Technology (TIST), 7(2):13, 2016.

Page 15: Distinguishing Cause from Effect Using Quantiles ... · Distinguishing Cause from Effect Using Quantiles: Bivariate Quantile Causal Discovery Natasa Tagasovska1 2 Valerie Chavez-Demoulin´

Bivariate Quantile Causal Discovery

A. TheoryA.1. Proof of Theorem 2We start by proving the sufficiency part of the theorem, namely that, if CL(l) = o(n) for l ∈ {τ, qX,τ , qY,τ , qY |X,τ , qX|Y,τ )},then

P

(CLτ (X) + CLτ (Y | X)

CLτ (Y ) + CLτ (X | Y )≤ 1

)−→n→∞

1 (8)

implies that

E [Sτ (qX,τ , Xi)] + E[Sτ (qY |X=Xi,τ , Yi)

]E [Sτ (qY,τ , Yi)] + E

[Sτ (qX|Y=Yi,τ , Xi)

] ≤ 1. (9)

First, notice that we have

CLτ (X) + CLτ (Y | X)

CLτ (Y ) + CLτ (X | Y )=

n∑i=1

Sτ (qX,τ , Xi) +n∑i=1

Sτ (qY |X=Xi,τ , Yi) +K1

n∑i=1

Sτ (qY,τ , Yi) +n∑i=1

Sτ (qX|Y=Yi,τ , Xi) +K2

, (10)

with

K1 = 2CL(τ) + CL(qX,τ ) + CL(qY |X,τ )− 2an(τ) = o(n)− 2an(τ),

K2 = 2CL(τ) + CL(qY,τ ) + CL(qX|Y,τ )− 2an(τ) = o(n)− 2an(τ).

Thus, we have that

CLτ (X) + CLτ (Y | X)

CLτ (Y ) + CLτ (X | Y )≤ 1 ⇐⇒

n∑i=1

Sτ (qX,τ , Xi) +n∑i=1

Sτ (qY |X=Xi,τ , Yi)

n∑i=1

Sτ (qY,τ , Yi) +n∑i=1

Sτ (qX|Y=Yi,τ , Xi)≤ 1 + op(1), (11)

where we used the fact that K2 −K1 = o(n), which implies that

K2 −K1n∑i=1

Sτ (qY,τ , Yi) +n∑i=1

Sτ (qX|Y=Yi,τ , Xi)= op(1).

Finally, (9) follows because the law of large numbers along with the continuous mapping theorem imply∣∣∣∣∣∣∣∣n∑i=1

Sτ (qX,τ , Xi) +n∑i=1

Sτ (qY |X=Xi,τ , Yi)

n∑i=1

Sτ (qY,τ , Yi) +n∑i=1

Sτ (qX|Y=Yi,τ , Xi)−

E [Sτ (qX,τ , Xi)] + E[Sτ (qY |X=Xi,τ , Yi)

]E [Sτ (qY,τ , Yi)] + E

[Sτ (qX|Y=Yi,τ , Xi)

]∣∣∣∣∣∣∣∣ = op(1).

To prove necessity, assume that

P

(CLτ (Y ) + CLτ (X | Y )

CLτ (X) + CLτ (Y | X)≤ 1

)−→n→∞

1.

By the sufficiency part of the proof, this implies that

E [Sτ (qX,τ , Xi)] + E[Sτ (qY |X=Xi,τ , Yi)

]E [Sτ (qY,τ , Yi)] + E

[Sτ (qX|Y=Yi,τ , Xi)

] ≥ 1.

which establishes necessity by contraposition. �

Page 16: Distinguishing Cause from Effect Using Quantiles ... · Distinguishing Cause from Effect Using Quantiles: Bivariate Quantile Causal Discovery Natasa Tagasovska1 2 Valerie Chavez-Demoulin´

Bivariate Quantile Causal Discovery

A.2. Proof of Theorem 3We start by proving the sufficiency part of the theorem, namely that, if Assumption 2 holds, then

P

(CLτ (X) + CLτ (Y | X)

CLτ (Y ) + CLτ (X | Y )≤ 1

)−→n→∞

1 (12)

implies that

P

(SX,τ + SY |X,τ

SY,τ + SX|Y,τ≤ 1

)−→n→∞

1. (13)

Denote by CLτ (·) the quantity corresponding to CLτ (X) but replacing the population quantile by estimates, namely

CLτ (X) = CL(τ) + CL(qX,τ ) + SX,τ − an(τ),

CLτ (Y | X) = CL(τ) + CL(qY |X,τ ) + SX|Y,τ − an(τ),

and similarly for Y and X | Y . Hence, we have

CLτ (X) + CLτ (Y | X)

CLτ (Y ) + CLτ (X | Y )=SX,τ + SY |X,τ +K1

SY,τ + SX|Y,τ +K2, (14)

with

K1 = 2CL(τ) + CL(qX,τ ) + CL(qY |X,τ )− 2an(τ),

K2 = 2CL(τ) + CL(qY,τ ) + CL(qX|Y,τ )− 2an(τ).

Consistency of the estimators (i.e., Assumption 2) along with the continuous mapping theorem implies∣∣∣∣∣CLτ (X) + CLτ (Y | X)

CLτ (Y ) + CLτ (X | Y )− CLτ (X) + CLτ (Y | X)

CLτ (Y ) + CLτ (X | Y )

∣∣∣∣∣ = op(1) (15)

Therefore, combining (12) and (15), we have that

P

(CLτ (X) + CLτ (Y | X)

CLτ (Y ) + CLτ (X | Y )≤ 1

)−→n→∞

1. (16)

In turn, from (14), we have that

CLτ (X) + CLτ (Y | X)

CLτ (Y ) + CLτ (X | Y )≤ 1 ⇐⇒

SX,τ + SY |X,τ

SY,τ + SX|Y,τ≤ 1 +

K2 −K1

SY,τ + SX|Y,τ= 1 + op(1), (17)

where the last equality follows from assuming, without loss of generality, that CL(τ) = o(n), which implies thatK1 = o(n)− 2an(τ) and K2 = o(n)− 2an(τ) and thus that K2 −K1 = o(n). Finally, (13) follows from combining (16)and (17).To prove necessity, assume that

P

(CLτ (Y ) + CLτ (X | Y )

CLτ (X) + CLτ (Y | X)≤ 1

)−→n→∞

1.

By the sufficiency part of the proof, this implies that

P

(SX,τ + SY |X,τ

SY,τ + SX|Y,τ≥ 1

)−→n→∞

1.

which establishes necessity by contraposition. �

Page 17: Distinguishing Cause from Effect Using Quantiles ... · Distinguishing Cause from Effect Using Quantiles: Bivariate Quantile Causal Discovery Natasa Tagasovska1 2 Valerie Chavez-Demoulin´

Bivariate Quantile Causal Discovery

B. bQCD implementationTo leverage Theorem 3 and Decision rule 1 for causal discovery, we need a flexible model that yields computable expressionsfor all conditional quantiles. We show the efficiency of our method independent of the choice of underlying regressorthrough various implementations, namely copulas, quantile forests and neural networks.

Algorithm 1 bQCD algorithmInputs:

• {Xi, Yi}ni=1, n i.i.d. observations of the rvs X and Y ,

• qr, a quantile regression method,

• m, the number of quantiles levels (weights/nodes pairs to approximate the integral).

1. Transform X and Y to standard normal.2. Compute the quadrature weights and nodes {wj , τj}mj=1.3. Compute the conditional and marginal scores for X → Y (and similarly for Y → X):for j = 1 to m do• Use the empirical quantile to compute qX,τj and SX,τj =

∑ni=1 Sτj (qX,τj , Xi).

• Use the method qr to compute qX|Y,τj and SY |X,τj =∑ni=1 Sτj (qY |X=Xi,τj , Yi).

end for4. Compute the scores S? =

∑mj=1 wjS?,τj for ? ∈ {X,Y, Y | X,X | Y }.

5. Define SX→Y = SX + SY |X and SY→X = SY + SX|Y .if SY→X > SX→Y then

Define the causal direction as d = X → Y .else if SY→X < SX→Y then

Define the causal direction as d = Y → X .end ifDefine the confidence score as C = −SX→Y + SY→X .Output: (d, C)

Nonparametric copulas Copulas (i.e., multivariate distributions with uniform margins), represent an appealing alternativeto compute conditional quantiles. In a nutshell, copulas make it possible to flexibly model the dependence structure betweenrandom variables while avoiding assumptions about the scales, functional forms, or other restrictions imposed when dealingwith marginal distributions.Following (Joe, 1996), copulas lead to a useful representation of the distributions of Y |X and X|Y , namely FY |X(x, y) =P (Y ≤ y|X = x) = P (V ≤ v|U = u) = ∂uC(u, v) where C is a copula, ∂uC(u, v) = ∂C(u, v)/∂u, u = FX(x), andv = FY (y), and similarly FX|Y (x, y) = ∂vC(u, v). This means that conditional distributions can be evaluated by takingpartial derivatives of the copula function. The τ -quantiles of Y |X can then be written as F−1

Y |X(x, τ) = F−1Y ((∂uC)−1(u, τ))

and similarly for F−1X|Y (τ, y), with inverse functions of the copula derivatives are with respect to τ . Using the above equations,

one can then compute all τ -quantiles using the marginal distributions and the copula. To benefit from a fully nonparametricestimation, in practice we use nonparametric copulas (Geenens et al., 2017). We implemented Quantile Copula CausalDiscovery (bQCD) using the R interface (R Core Team, 2017) interface to vinecopulib called rvinecopulib (Nagler& Vatter, 2018) and Gauss-Legendre quadrature from the package statmod (Smyth, 2005). We refer to Section B.1 formore details.

Quantile forest Another nonparametric and consistent way of estimating conditional quantiles is via quantile regressionforests. In contrast to random forests (Breiman, 2001), for each node in each tree, quantile regression forest keeps thevalue of all observations in this node, not just their mean, which allows us to assess the conditional distribution based onthis information. For detailed explanation of the algorithm and its consistency we refer the reader to the original paper(Meinshausen, 2006). In experiments we used nodesize=10, sampsize=50 and n_trees=500 for both regressiondirections.

Page 18: Distinguishing Cause from Effect Using Quantiles ... · Distinguishing Cause from Effect Using Quantiles: Bivariate Quantile Causal Discovery Natasa Tagasovska1 2 Valerie Chavez-Demoulin´

Bivariate Quantile Causal Discovery

Quantile neural network Recently, quantile estimation has become of interest in the deep learning community as part ofuncertainty estimation framework. We leverage such an approach for a flexible approximation of the conditional quantileswhich are estimated simultaneously by a neural network trained with pinball loss, i.e. the quantile scoring as a loss function.In the experiments we use the qrnn package by Cannon (2018) which includes monotonicity constraints to guard againstquantile crossing. Again, for both sides (possible causal directions) we use symmetric models, which is a two-layered neuralnetwork with 10 neurons per layer, with sigmoid nonlinearities. The networks were trained to 1000 iterations.

m=1

0

25 (%)

50 (%)

75 (%)

100 (%)

SIM

SIM−c

SIM −ln

SIM −G

AN

AN −s

LS

LS −s

MN −U

Tueb

●●

CopulaRand ForestNN

m=3

0

25 (%)

50 (%)

75 (%)

100 (%)

SIM

SIM−c

SIM −ln

SIM −G

AN

AN −s

LS

LS −s

MN −U

Tueb

●●

●●

CopulaRand ForestNN

m=7

0

25 (%)

50 (%)

75 (%)

100 (%)

SIM

SIM−c

SIM −ln

SIM −G

AN

AN −s

LS

LS −s

MN −U

Tueb

●●

●●

CopulaRand ForestNN

Figure 8. bQCD robustness with respect to implementation for different number quantiles.

B.1. Predicting Quantiles with CopulasWhile not the only statistical model satisfying Assumption 2, copulas (i.e., multivariate distributions with uniform margins),represent an appealing alternative. Any bivariate nonparametric estimator allowing to compute the required inverseconditional CDFs could be used. But, to the best of our knowledge, no available implementation allows directly for this.With copulas, obtaining conditional quantiles is straightforward through (18). And they are implemented efficiently becausethey appear in higher-dimensional models such as vines (Bedford et al., 2002; Aas et al., 2009).According to the theorem of (Sklar, 1959), any F can be represented by its marginal distributions FX , FY and a copulaC, which is is the joint distribution of (U, V ) =

(FX(X), FY (Y )

). In other words, for any F , there exists a C such that

F (x, y) = C(FX(x), FY (y)) for each (x, y) ∈ R2. Moreover, if all the distributions are continuous, then C is unique.Following (Joe, 1996), copulas lead to a useful representation of the distributions of Y |X and X|Y , namely FY |X(x, y) =P (Y ≤ y|X = x) = P (V ≤ v|U = u) = ∂uC(u, v) where ∂uC(u, v) = ∂C(u, v)/∂u, u = FX(x), and v = FY (y), andsimilarly FX|Y (x, y) = ∂vC(u, v). This means that conditional distributions can be evaluated by taking partial derivativesof the copula function. The τ -quantiles of Y |X can then be written as

F−1Y |X(x, τ) = F−1

Y ((∂uC)−1(u, τ)), (18)

and similarly for F−1X|Y (τ, y), with inverse functions of the copula derivatives are with respect to τ . Using (18), one can then

compute all τ -quantiles using the marginal distributions and the copula.Assume that we have n i.i.d. random variables {(X1, Y1), · · · , (Xn, Yn)}. To avoid relying on restrictive assumptions,SX→Y and SY→X are estimated completely nonparametrically.Note that, if all the considered distributions are differentiable, F (x, y) = C(FX(x), FY (y)) implies

f(x, y) = c{FX(x), FY (y)

}fX(x)fY (x), (19)

where f, c, fX , and fY are the densities corresponding to F,C, FX , and FY respectively. Equation (19) has an importantimplication for inference: because the right-hand side is a product, the joint log-likelihood can be written as a sum of thelog-likelihood of each margin and the log-likelihood of the copula. This fact can be conveniently exploited in a two-stepprocedure: estimate the margins separately to obtain FX and FY , and then take the probability integral transform of the datausing those margins, that is, define Ui = FX(Xi) and Vi = FY (Yi), to estimate C.

For the first step, we simply use the empirical distribution FX(x) =∑ni=1 I {Xi ≤ x} /(n + 1) (and similarly for Y ),

where n+ 1 is used instead of n in the copula context to avoid boundary problems. Discrete datasets are handled by jittering,that is breaking ties at random. As for the second step, since typical nonparametric estimators are targeted at densities withunbounded support, they are unsuited to densities restricted to [0, 1]2. To get around this issue, (Scaillet et al., 2007) suggests

Page 19: Distinguishing Cause from Effect Using Quantiles ... · Distinguishing Cause from Effect Using Quantiles: Bivariate Quantile Causal Discovery Natasa Tagasovska1 2 Valerie Chavez-Demoulin´

Bivariate Quantile Causal Discovery

to first transform the data to standard normal margins, and then use any nonparametric estimator suited to unboundeddensities. The transformation estimator of the copula density c(u, v) is then defined as

c(u, v) =g(Φ−1(u)− Φ−1(Ui),Φ

−1(v)− Φ−1(Vi))

φ(Φ−1(u))φ(Φ−1(v)), (20)

where Ui = FX(Xi), Vi = FY (Yi), g is a bivariate nonparametric estimator, and φ,Φ denote the standard normal densityand distribution, respectively. Following (Scaillet et al., 2007; Geenens et al., 2017), we then use a bivariate Gaussian kernelestimator for g.The consistency and asymptotic normality of this estimator are derived in (Geenens et al., 2017). Given that theempirical distribution is also consistent, consistency of the resulting joint distribution is also straightforward. Fur-thermore, using the fact that FX(x) = F (x,∞), FY (y) = F (∞, y), FX|Y (x, y) = ∂yF (x, y)/ ∂yF (x, y)|x→∞,FY |X(x, y) = ∂xF (x, y)/ ∂xF (x, y)|y→∞, consistent estimators of the marginal and conditional distributions are ob-tained by replacing F by the copula-based joint estimator and using the same relations.The transformation kernel estimator for bivariate copula densities is implemented in C++ as part of vinecopulib(Nagler & Vatter, 2017), a header-only C++ library for copula models based on Eigen ()eigenweb and Boost (Schaling,2011). From (20), vinecopulib constructs and stores a 30 × 30 grid over [0, 1]2 along with the evaluated densityat the grid points6. Then, a cubic-spline approximation makes it possible to efficiently compute the copula distributionC(u, v) and its derivatives, ∂uC(u, v) and ∂vC(u, v), as the integrals of the spline-approximation of the density admits ananalytic expression. Finally, vinecopulib implements the numerical inversion of ∂uC(u, v) and ∂vC(u, v) to compute(∂uC)−1(u, v) and (∂vC)−1(u, v) by a vectorized version of the bisection method.

C. Baseline implementation detailsC.1. Baselines - Implementations and hyper parametersFor IGCI, we use the original implementation of Janzing et al. (2012) with slope-based estimation with both gaussian anduniform reference measure. For LINGAM, we use the implementation of Peters et al. (2014), which also provides RESITwith GP regression and the HSIC independence test with a threshold value α = 0.05. For CAM, we use the R package(Peters & Ernest, 2015) with the default parameters. For GR-AN and EMD, we use the code of Hernandez-Lobato et al.(2016). The parameters for EMD on simulated data are as in the original paper, λ = 1e−3 and σ = 1

5Sm (Sm being themedian of distances across all input patterns). For the real data benchmark, GR-AN was evaluated over 15 subsampleslimited to 500 observations, and the results are then averaged. However, the parameters are tuned and selected from theoverall best results. GR-AN has a built-in function that takes care parameter tuning.For Slope, we use the implementation of Marx & Vreeken (2017), with local regression included in the fitting process. ForPNL, GPI-MML, and ANM, we use the MATLAB implementation from the Cause Effect Pairs Challenge FirfiD, with thesame setup as RESIT for ANM. For Sloppy we use the results as provided with the code of (Marx & Vreeken, 2019) withthe AIC score. For RECI we use an example MATLAB code provided by the authors with the ’log’ implementation, withand without the data preprocessing step. Because there is no publicly available implementation of CURE, we use the resultsfrom files obtained by Marx & Vreeken (2017). Our code and datasets used for the paper are included in the submittedsupplementary package.All experiments are carried out using R version 3.4.1 on a MacBook Pro (mid-2015) with 16GB memory, a 2.2 GHz IntelCore i7 processor and 250GB SSD.

6 After extensive simulations comparing copula densities to their kernel approximations, the authors of vinecopulib noticedthat the precision gains achieved by increasing beyond 30× 30 were marginal. Using a grid is not actually required to implement thekernel-based estimator, but evaluating kernels for each call to the copula’s density is computationally expensive and storing the values is atime-memory trade-off.

Page 20: Distinguishing Cause from Effect Using Quantiles ... · Distinguishing Cause from Effect Using Quantiles: Bivariate Quantile Causal Discovery Natasa Tagasovska1 2 Valerie Chavez-Demoulin´

Bivariate Quantile Causal Discovery

C.2. Computational complexity

0 1000 2000 3000 4000 5000

020

040

060

080

0

sample size (n)

CP

U ti

me

(sec

)

Figure 9. Linear scaling of bQCD over 100 pairs, m = 1, n varies

D. Additional results from experimentsD.1. Tabular results for spider plots

Table 2. Accuracy on simulated and real data.SIM SIM-c SIM-ln SIM-G AN AN-s LS LS-s MN-U Tub Tub weigh.

bQCD m = 1 0.57 0.69 0.69 0.59 0.99 0.47 1 0.93 0.23 0.64 0.74bQCD m = 3 0.61 0.72 0.76 0.65 1 0.81 1 0.98 0.99 0.69 0.77bQCD m = 7 0.65 0.76 0.74 0.66 1 0.82 1 0.98 0.99 0.67 0.75IGCI - u 0.42 0.49 0.52 0.54 0.41 0.27 0.63 0.37 0.07 0.67 0.72IGCI - g 0.36 0.46 0.62 0.85 0.98 0.98 0.99 0.94 0.74 0.61 0.64biCAM 0.57 0.6 0.87 0.81 1 1 1 0.53 0.86 0.57 0.58Slope 0.45 0.54 0.47 0.48 0.18 0.18 0.21 0.17 0.07 0.75 0.83RESIT 0.78 0.82 0.87 0.77 1 1 0.6 0.03 0.05 0.53 0.63ANM 0.76 0.81 0.80 0.77 1 1 0.62 0.09 0.03 0.59 0.59LINGAM 0.42 0.53 0.31 0.25 0 0.04 0.07 0.03 0 0.27 0.42EMD 0.45 0.4 0.52 0.58 0.36 0.33 0.6 0.42 0.83 0.6 0.68GRAN 0.48 0.44 0.43 0.37 0.05 0.06 0.11 0.20 0.5 0.4 0.5GPI 0.82 0.86 0.88 0.94 0.95 0.11 0.91 0.54 0.90 0.6 0.63PNL 0.70 0.65 0.61 0.64 0.96 0.63 0.91 0.44 0.66 0.75 0.73Sloppy 0.64 0.62 0.77 0.81 1 1 1 0.56 0.96, 0.59 0.74RECI w/wo pp 0.7/0.6 0.71/0.65 0.8/0.71 0.66/0.67 0.72/0.72 0.15/0.17 0.82/0.67 0.71/0.52 0.88/0.88 0.61/0.63 0.7/0.67

Table 3. Accuracy on simulated and real data for different selection of τ values.

SIM SIM-c SIM-ln SIM-G AN AN-s LS LS-s MN-U Tueb Tueb-w

τ ∈ [0.1, 0.5, 0.9] 0.61 0.72 0.76 0.65 1 0.81 1 0.98 0.99 0.69 0.77τ ∈ [0.4, 0.5, 0.6] 0.56 0.71 0.72 0.58 1 0.45 1 0.95 0.25 0.63 0.73

Table 4. Area under the ROC and PR curves for bQCD.SIM SIM-c SIM-ln SIM-G AN AN-s LS LS-s MN-U

ROC-AUC 0.67 0.79 0.87 0.69 1 0.9 1 1 1PR-AUC 0.67 0.82 0.86 0.7 1 0.89 1 1 1

Page 21: Distinguishing Cause from Effect Using Quantiles ... · Distinguishing Cause from Effect Using Quantiles: Bivariate Quantile Causal Discovery Natasa Tagasovska1 2 Valerie Chavez-Demoulin´

Bivariate Quantile Causal Discovery

Note that due to RESIT being undecided in few of the real data pairs, we observe the random classifier behavior at thebeginning of its ROC curve in Figure 10.

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00False Positive Rate

True

Pos

itive

Rat

e ANAN−sHNHN−sSIMSIM−cSIM−lnSIM−GMN−UTueb

a

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00False Positive Rate

True

Pos

itive

Rat

e

QCDSlopeEMDIGCIPNLRESITGPILINGAMGRANCURECAMSloppyRECI

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

decision rate

% o

f cor

rect

dec

isio

ns

QCDSlopebiCAMRESITLINGAMIGCI−uGRAN

CUREANMGPIPNLEMDSloppyRECI

Figure 10. ROC curves (a) for bQCD all benchmarks, (b) all baselines on Tuebingen, (c) and weighted accuracy-decision rate curves forall baselines on the Tubingen dataset.

E. Extensions of the pairwise methodRecent work by Goudet et al. (2017a) suggests that pairwise and CPDAG (i.e., the skeleton and the v-structures) learningprocedures can suitably complement each other. In this section, we follow a similar approach to suggest an extension ofbQCD to multivariate datasets: start from the CPDAG resulting from another method, and then use bQCD to orient theedges. We rank edges and include them in the graph sequentially, starting from the edge with the highest confidence, whilechecking the acyclicity of the resulting graph after each addition. Note that this approach also requires a final verification totest edge orientation and v-structures consistency (Goudet et al., 2017a).We do this by using CAM to learn a CPDAG and then use bQCD to orient its edges. The rationale is that, while Buhlmannet al. (2014) proves the consistency of the variable ordering by CAM used to learn the structure even when its assumptionsare not met, bQCD is better in pairwise discoveries (see Section 3). But we could also have used the PC algorithm (Spirteset al., 2000b).We now turn to the well know protein causal dataset by Sachs et al. (2005), for which a ground truth DAG of the causalstructure is provided. We used the cd3cd28 dataset with 853 observations of 11 proteins. For evaluation we use the structuralhamming distance (SHD) as proposed in (Tsamardinos et al., 2006) (adding, removing or reversing an edge) necessary totransform one graph to the another and the structural intervention distance (SID), as in (Pet, 2015), which is consideredto be appropriate for quantifying the correct order among variables, by estimating the causal effects they entail. CAMby itself outputs a DAG with SIDCAM = 53, and SHDCAM = 17, while with bQCD we can reduce this to averageSIDbQCD = 46(2.5), and average SHDbQCD = 15.1(1.6). In Table 5 we provide the average results over ten randomsub-samples of the dataset where the edges of CAM’s CPDAG were oriented by each of the pairwise methods 7. From thepresented results, we note that bQCD achieves best scores in terms of SHD and SID.So What exactly is the nature of the output on these undirected/directed edges, are there reported confidence scores on whythe edge wasn’t oriented e.g. if it were borderline or notAs such, it is reassuring that bQCD is able to correctly decide for the causal direction, even though other dependenciesaffect the pairwise (direct) causal relationships in such confounded dataset. While such results are promising, a consistenthybrid method extending bQCD and quantile based methods to higher dimensional datasets is left for further work. Evenin this simplified setting, if one were to use Decision rule 1 on all pairwise adjacent vertices in a CPDAG, one may stillget a PDAG (i.e., due to non-decisions). Furthermore, extensions to higher dimensional settings could use quantile basedindependence tests in RESIT, or replace mean regression in CAM by quantile regression.

7We only included R based implementations.

Page 22: Distinguishing Cause from Effect Using Quantiles ... · Distinguishing Cause from Effect Using Quantiles: Bivariate Quantile Causal Discovery Natasa Tagasovska1 2 Valerie Chavez-Demoulin´

Bivariate Quantile Causal Discovery

RAF MEK

PLCG

PIP2

PIP3

ERP

AKT

PKA

PKC

P38JNK

Figure 11. Causal protein network as obtained by CAM’s CPDAG oriented with bQCD. Blue edges are correctly oriented.

Table 5. SID and SHD for the Sachs dataset using CAM CPDAG with causal directions from pairwise methods by 10-fold cross validation.Average and standard deviations.

bQCD IGCI-u Slope RESIT CAM EMD LINGAM GRAN Random

SID 46.5(2.5) 50.3(1.9) 54.5(3.3) 51(0) 53.4(1.4) 51.5(2.7) 48.9(0.87) 49.5(4) 52.5(3.6)SHD 15.1(1.6) 15.8(1.7) 17.6(1.2) 14(0.3) 15.4(1.6) 16(1.7) 14.3(0.82) 16.4(1.34) 17.2(1.4)

F. Scatter plots of simulated pairs

● ●●●

●●●●

● ●●

●●●

●●

●●●

●●

● ●●

●●●●

●●●

●●

●●

●●

●● ●● ●●●●

●●●

●●●●

●●●●

● ●

●●

●● ●●●

● ● ●● ●●● ●

●●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●

●●●

●● ●●

● ●●●

●●●

●●●●●●●●

●●

● ●●●

●●

●●●●

●●

●●

●●

● ●●●●●

● ●● ●●●● ●

●●●●

●●●

●●

●●●

● ●

●●

●●●

●●

●●

●●● ●

●●●

●● ●

●● ●

●●●●

●●●●

●● ●

●●●

●●●

●●

●●

●●●●●

●●●

●●

●●●●●

●●●●●

●●●

● ●●

●●

●●

●●

●●

●●

● ●●

● ●●

●●●

●●●● ●

●●●

●●●

● ●●

●●

●●

●●●●

●●●

●●

●●●●●●●●●

●●●

●●●

● ●●●

●●

●●●

●●●●● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

● ●●

●●●

●●●●

●●●●

●●● ●

●●●

●●●●

●●●

● ●

●●

●●

●●

●●● ●

●●●

●●

●●●

●●

●●●

●●●

●●●●

●●

●●

●●

●●●

● ●●●

●●

●●●●●

●●●●

●●●●

●●●

● ●●●

●●●

● ●●

●●

●●●● ●●●●● ●

●●● ●

●●●●

●●●●

●●●●●● ●

● ●

●●●

● ●●●●

●●●

●●

●●●●

●●●

●●●

●● ●●

●●

●●

●●●

●●

●●●

●●

●●

● ●●●●●

●●●

●●●

●●●●

●●

●●●

●●●

●●●

●●●

●●

●●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

● ●●●

●●

●● ●

●● ●

●●●●

●●●●

●●

●●

●●

●●●

● ●●

● ●●●

●●

● ●

●● ●●●●

●●

●●● ●

●●●●●

●●●

●● ●●

●●●●

● ●● ●

●●●

●●

●●●●●

●●

● ●●● ●

●●●●●

●●

●●

●●●

●●

●●●

●●●

●●

●●●● ●

●●

●●

●●● ●●●●

●●●●

●●●●

●● ●

●● ●●

●● ●

●●●

●●●

●●●●

●●

●●

●●●

●● ●

●●●

●●

●●

● ●●

●●

●●●●

● ●

● ●●● ●●

●●●

●●●● ●

●●

●●●

●●

●●

●●

●●

● ●● ●●

●●●●●

●● ●

●●●

●●●

●●

●●

●●

●● ● ●●

● ●●

●●

●●

●●●

●●●

● ●●●●

●●

●● ●●

●●●●●●

●●

−4 −2 0 2 4

−5

05

x <− y

x

y

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

● ●●

●● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●● ●

−2 −1 0 1 2

−2

−1

01

2

x <− y

x

y

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●●

●●

●●

●●●●●

●●

●●●●●

●●●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●

● ●

●●●

●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●●

●●

●●●●●●

●●

●●

●●●

●●●●●●

●●●●●●

●●●

●●

●●●

●●

●●●●●

●●

●●●

●● ●

●●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●●

●●●

●●

●●

●●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●●●●

●●●

●●

●●

●●●

●●●

●●

●●

●●●●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●●

●●

●●● ●●

●●

●●

●●●

●●●

●●

●●

●●●●●

●●

●●●●●

●●

●●●●●

●●

●●

●●

●●●

●●●●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

●●●

●●

●●

●●

●●●●

●●

●●●

●●

●●●●

●●

●●●

0 5 10

−2

02

46

8

x −> y

x

y

●●

●●●●

●●

●●

●● ●●●●●●●

●●

●●

●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●● ●●● ●

●●

●●

●●●

● ●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●● ●

●●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●●●●

●●

● ●●

●●●

●●

●●

● ●●

●●

●●●●

● ●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●●●

●●●●

●● ●

●●

●●●

● ●

● ●●

●●●

●●

●●

●●●●

●●

●●●

●●●●

●●

●●

● ●●

● ●●

●●

●●●●●

●●

●●●

●●●●

●●

●●

●●

●●●●

●●● ●

● ●●●

●●

●●●

●●

●●●

●●●● ●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

●●●

●●●●

●●

●●

●●● ●

●●

●●●● ●

●●

●●●●

●●

●●●

●●●

● ●●●●●

●●●

●●

●●

●●

●●●

●● ●

●●

●●

●●

●●●●●●●●

● ●

●●●

●●

● ●●●

●●●●

● ●●

●●

●●

●●

●●

●●●●●

● ●●●●

●●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●

● ●●●●●●

●●●●● ●●●

● ●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●●

●●

●●●●

●●

●●

●●●

●●●●●●

●●

●●

●●

● ●●

● ●●●●●●

●● ●

●●

●●●

●●

●●

●●

●●●

●●●●

●●

● ●

●●

●●

● ●● ●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●● ●●

●●

●●

● ●●

●●●

●●

●●

●●

●●

●●●

●●●

●● ●●●

●●

●●●

●●

●●

●●

●●●

●●

●●●

●●

●●●●●●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●●●●●

●●

●●●

● ●●●

●●●

●●

●●●

● ●● ●

●●●

●●

●●

●●●●●●

●●●

●●

●● ●

●●

●●

●●

●●●

−6 −4 −2 0

−2

02

46

8

x <− y

x

y

● ●●●

●●

●●

●●●

●● ●●

● ●●

●●●

●●

●●●

●●● ●

●●●

● ●

●●

●●●

●●●●●●●●●●

●●●●●

● ●●●

● ●●

●●●●

●●

● ●●● ●

●●●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●●

●●●●

●●

●●●●●●● ●

●●

● ●●●

●●

●●● ●

●●●

●●

●●

●●●●●

●●●●●●●●

●●●●

●●●

●●

●●●●

●●●

●●

●●

●● ●●

●●●

●●●

●●●

●● ●

●●●●

●●

●● ●●

● ●●●●

●●●

●●

●●

●●● ●●

●●

●●

● ●●●●

●●

●●●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●●

●●●●

●●●●●

●●●

●●●●●●

●●

●●

●●● ●

●●●●

●●

● ●●●● ●● ●●

● ●●●

●●●●

●●

●●

●●●●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●●●

●●

●●

●●●

●●●

●●●●

●●●●

●●

●●

●●

●●

●●

● ●●●● ● ●

●●●

●●

●●

● ●●●

●●●●

●●

●●

●●

●●●

● ●●●

●●

●●●

●●●

●●●●

●●●●

●● ●

●●●●

●●●

●●●

●●

●●

●●●●●●● ●●

●●●

●●●●

●●●●

●●●●●●●

●●

●●●

●●●●●

●●●

●●

●●● ●

●●●

●●●

●●●●

●●

●●●

●●

●●

●●●

●●●

●●●●●●

●●

●●●● ●●

●●●

●●

●●●

●●

●●●

●● ●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●● ●●

●●

●●●

●●

●● ●

●● ●●

●●●●

● ●●

●●

●●●

●●●

●●

●●● ●

●●

●●●●●

●●

●●

●●●

●●

●●●

●●●●●

●●● ●

●● ●●

●●

● ●

●●●

●●

●●●●●

●●●

●●●

● ●

●●●●●

●●

●●

●●●●

●●

●●●● ●●

●●

●●●●●●

●●

●●

●●● ●●●●

●●●●

●●●●

●● ●

●●●●

●●

●●

● ●●

●●●

● ●●●

●●

●●

●●

●●● ●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●● ●● ●●

●●●●

●●●● ●

●●

●●●●

●●

●●

●●

● ●● ●●

●●●

●●●●

●●

●●●

●●●

●●

●●

●●

●● ●●

●●●●

●●

● ●

●●●

●●

●●●●●●

●●

● ●●●

● ●●

●●●

●●

−2 0 2 4

−5

05

x <− y

x

y

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

−2 −1 0 1 2

−2

−1

01

2

x <− y

x

y

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●●

●●●●

●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●●

●●

●●

●●

●●●

●●●●●

●●●●●●

●●●

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●●

●●

●●●●

●●●●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●●

●●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●●●●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●●●●●

●●●

●●

●●

●●

●●●

●●●

●●

●●

●●●●●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●●●

●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●●

●●●●

0 5 10

02

46

8

x −> y

x

y

●●

●●●

●●

●●

●●●●●●●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●●● ●

●●

●●

●●●

●●●

●●

● ●●

●●

●●

●●●

●●

●●●

●●

●● ●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●●

●●

●● ●●●

●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

● ● ●

●●

●●●●

●●●●

●● ●

●●

●●●

●●

● ●●

●●●

●●

●●

●●●

●●●

●●●

●●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●●●●●●

●●

● ●

●●●●

●●

●●

●●

●●

●●

●●●●

●●

●●●

●●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●●●

● ●●

●●

●●

●●

●●

●●● ●

●●

●●

●●

●●

●●● ●●●

●●●

●●●●

●●●●●

●●●

●●

●●

●●●●●

●●●

●●

●●

●●

●●●●●●●●

●●

●●●

●●

●●●●

●●●●

●●●

●●●

●●

●●

●●

●●●

●●● ●●

●●

●●

●●

●●●

●●●●

●●

●●●

●●

●●●

●●●●●●

●●●

●●●

●●●

●●

●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●

●●

●●●

●●●●●●

●●

●●

● ●●

●●●●●●●●

●●● ●●

●●●

●●●

●●

●●●●●●

●●

●●●

●●●

●●

●●

●●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●●●●●

●●●●

●●

●●

●●●

● ●●

●●

●●

●●

●●●

●●●

●●

●●●

●●●●●

●●●

●●●

● ●●●

●●●

●●

●●

●●●●●

●●●

●●

●●● ●●

●●

●●

●●

●●●

−5 −3 −1 0 1 2

−2

02

46

8

x <− y

x

y

●●

● ●

●● ●

●●

●●

●●

●●

●●●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●● ●

●●

●●

●●

●●

●●●

●●

● ●●

●●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●● ●

●●

●●

●●

●●

●●●

●●

● ●

●● ●●

●●●

● ●●●

●●●

●●

●●●

● ●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●● ●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●●

●●●

●●

−2 −1 0 1 2 3

−3

−1

01

23

4

x <− y

y

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●● ●●●

●●

●●

●●

●●●

● ●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●●●

●●

●●

● ●

●●

●●

●●

● ●●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●● ●

●●●

●●

● ●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●●●

●●●

●●

●●

●●●

●●●

●●● ●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

−3 −2 −1 0 1 2 3

−3

−1

01

23

x <− y

y

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

● ● ●

●● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

●●

●●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●● ●

●●

●●

●●● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●● ●

●●

● ●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

−2 −1 0 1 2 3

−3

−2

−1

01

23

x −> y

y

●●

●● ●

●●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

● ●

●●

●●●

●●

●●

● ●

●●

●●

● ●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●● ●

●●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●●

●●●

●●●

●●

●●

●●

●●

●●●

● ●●

● ●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●● ●

●●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

● ●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●● ●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

● ●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

−2 −1 0 1 2

−3

−1

01

23

x <− y

y

Figure 12. SIM/SIM-ln/SIM-G (first/second/third row).

Page 23: Distinguishing Cause from Effect Using Quantiles ... · Distinguishing Cause from Effect Using Quantiles: Bivariate Quantile Causal Discovery Natasa Tagasovska1 2 Valerie Chavez-Demoulin´

Bivariate Quantile Causal Discovery

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●●

● ●

● ●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●● ●

●●●

●●

●●

● ●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●● ●

●●

●●

● ●

●●

● ●

−4 −2 0 2 4

−1.

00.

00.

51.

01.

5

x −> y

x

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

● ●●

● ●

●● ●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●●

●●● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●● ●

● ●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ● ●

●●

● ●

● ●●

●●

●●

●●

●●

●●

●● ● ●

● ●

●●

●●

●●

● ●●

●●

● ●

−1.5 −1.0 −0.5 0.0 0.5 1.0

−2

02

4

x <− y

x

y

●●

●●

●●

●●

●●

●●

●●

●●

●● ●●●

●● ●●

● ●

●●

● ●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●● ●

● ●

●●●●

●●

● ●

● ●

● ●●●

●●

●●

●●

●●

●●

● ●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●● ●●

●●

●●

●● ●●

● ●

●●● ●

● ●

● ●●

●●

●●

●●

●● ●

●●

● ●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●● ●

●●

● ●

●●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

● ●

●●

●●

●●

● ●● ●

●●

●●

●●

−2 0 2 4

−1.

5−

0.5

0.0

0.5

1.0

1.5

x −> y

x

y

●●

●●

● ●●●

● ●

● ●●

●●

●●●

● ●

●●●

●●

●●

●●

●●

●●

●●

● ●

● ●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

−2 0 2

−2.

5−

1.5

−0.

50.

00.

5

x −> y

x

y

● ●

●●

●●

●●

●●

●●

●●●

●●

● ● ●

●●

●●

●●

●●

●●●

●●

●●

●● ●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●● ●

● ●

●●

●●

●●

● ●●

●●

● ●●●

●●

●●

●●

●●

●●●●

●●

●●

● ● ●

●●●

●●

● ●

●● ●

● ●

●●

●●● ●

●●

● ●●

●●

● ●

●●

●●● ●

●●

●●

● ●●●

●●

●●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●● ●

● ●●●

●●

● ●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

● ●●

●●

●●

●●

●●

●● ●●

●●

●●

● ●●

●●

●●● ●●

●●

●●

●●

●●

● ●●

●●●

●●●

●●

●●

●●

●●

●●

●●●●

● ●

●●

●● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●●

●● ●

●●●

● ●●

●●●●

●●

●●

● ●

● ●

●●

●●●

●●

●●

●●●●

●●

●●

● ●●

−4 −2 0 2

−1.

5−

0.5

0.5

1.0

1.5

x −> y

x

●●

●●

●●

●●

● ●

●●

●●

● ●●

●●

●●

●●●

● ●

●●

● ●

● ●

● ●

●●

●●●● ●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

●● ●

●● ●

● ●

●●

●●

●●

●●

●●

● ●

● ●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

● ●●

●●●

● ●

● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●

● ●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

● ●

●●

● ●●

●●●

●●

● ●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●● ●

●●●

●●

●●

−1.0 −0.5 0.0 0.5 1.0

−3

−2

−1

01

23

x <− y

x

y

●●

●●

● ●

●●

●●

●●

●●

●●

●● ●

●●

● ●

●●

●●

●●

●●

● ●

● ●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●● ●

●●

●●

●●

●●

● ●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●●

●●

●●

●●

● ●●

● ●

●●

●●

●●

●●●

●●

● ●

●●

● ● ●

●●●●

●●

● ●●

●●

● ●●

●● ●

● ●

●●

●●

●●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●●

●●

●●

●●●●

●●

●●

●●●

●●

●●●

●●

● ●

●●

● ●●

●●

●●

●●

● ●

●● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●● ●●

●●

●●

●●

●●

●●

●●

●●●

−4 −2 0 2 4

−1

01

2

x −> y

x

y

● ●●

●●

● ●

●●

● ●

●●

●●

●●

● ●

●●

●●●

●●

●●

● ●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

● ●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●●

●●

●●

● ●

●●

● ●

●●●

●●

●● ●

●●

●●●

●●

●●●

●●

●● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●● ●

−1.5 −1.0 −0.5 0.0 0.5 1.0

−2

02

4

x <− y

x

y

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●● ●

● ●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●● ●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●●

● ●

● ●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

−4 −2 0 2 4

−1.

00.

00.

51.

01.

52.

02.

5

x −> y

x

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

● ●

● ●

● ●

● ●

●●

●●●

●●

●●

●●

● ●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

● ●

●●

●● ●

●●

●●

●●

●●

● ●

●●

● ●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●●

● ●

● ●

●●

●●

●● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

0 1 2 3

−4

−2

02

4

x <− y

x

y ●

●●

●●

●●

● ●

●●

●●

●●●●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●●

● ●

●●

●●●

●●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

● ●

● ●●

●●

●●

●●

●●

●●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

● ●

●● ●

●●

●●

●●●

●●

●●

●●

●●

●●

0.0 0.5 1.0 1.5 2.0

−4

−2

02

4

x <− y

x

y

●●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

● ●

● ●

●●

●●

●● ●

●●

●●

●●

● ● ●

● ●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

● ●

●●

●●

●●

●●●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●● ●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●● ●●

● ●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

−2 0 2 4

01

23

45

x −> y

x

y

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●● ●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●● ●●

● ●

●●

● ●

●●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

−4 −2 0 2 4

−1.

00.

00.

51.

01.

52.

0

x −> y

x

●●

●●

● ●

● ●●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

● ●●

●●

●●

●●

●●

●●

● ● ●

●●

●●

●●

●●

● ●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

−1 0 1 2

−4

−2

02

x <− y

x

y

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●● ●

●●

●●

●●

●●

●●

● ●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

● ●

● ●

●● ●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●● ●

●●

●●●

●●

●●●

●●●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

● ● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●

● ●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●●●

●●

● ●

● ●

−1.0 0.0 0.5 1.0 1.5 2.0

−4

−2

02

4

x <− y

x

y ●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●● ●

●●

●●

●●

●●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●●●

●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

● ●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

−1.0 0.0 0.5 1.0 1.5 2.0

−2

02

4

x <− y

x

y

● ●

●●

●●

●●

●●

●●

● ●

●●

● ● ●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●●●

●●

●●

●●

●●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

● ●

● ●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●

● ●

●● ●

●●

●●

● ●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

−4 −2 0 2 4

−0.

50.

00.

5

x −> y

x

●●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●● ●

●●

●●

● ●●

●●

● ●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●●

● ●

●●●

●●●●

●●

●●

●●

●●

●●

−4 −2 0 2

−1.

0−

0.5

0.0

0.5

x −> y

x

y

●●●

●●

●●

●●●

●●

●●

●●●

●●

●● ●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●● ●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

● ●

● ●

● ●

●●

● ●

●●

●●

●● ●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●●

● ●●

● ●

●●

●●●

● ●

●●

●●

●●

● ●

●●

● ●

● ●●

●●

●●

● ●

●●

●●

●●

●●

● ●●

−4 −2 0 2 4

−0.

6−

0.2

0.2

0.6

x −> y

x

y

●●

● ●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●●

● ●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●● ●●

●●

●●

● ●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●●

● ●

● ●●

●●

●●

●●

●●

●●

●●●

●●

● ● ●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●● ●

● ●

●●

●●

● ●

●●

●● ●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

−0.5 0.0 0.5

−2

02

x <− y

x

y

Figure 13. By row AN/AN-s/LS/LS-s/MN.