32
Downloading Tetrad Plus a Quick Introduction to Non-Gaussian Orientation Joseph Ramsey 1

Downloading Tetrad Plus a Quick Introduction to Non-Gaussian Orientation Joseph Ramsey 1

Embed Size (px)

Citation preview

Page 1: Downloading Tetrad Plus a Quick Introduction to Non-Gaussian Orientation Joseph Ramsey 1

Downloading Tetrad Plus a Quick Introduction to Non-Gaussian Orientation

Joseph Ramsey

1

Page 2: Downloading Tetrad Plus a Quick Introduction to Non-Gaussian Orientation Joseph Ramsey 1

Tetrad Source

• The Tetrad source code is freely available, under the GNU GPL license; you just have to know where to look!

• Look in the Tetrad downloads directory (link on the main Tetrad page).

• Look for the latest “dist” (distribution) file, unzip it.• Install ant (google it).• Run the “run” target in ant.

• I periodically post new versions, so check back periodically.

Page 3: Downloading Tetrad Plus a Quick Introduction to Non-Gaussian Orientation Joseph Ramsey 1

Tetrad Source

• All of the code will be in the distribution, except for private project code.

• This can be useful if you want to modify or extend algorithms, or if you want to set up specific kinds of testing, or if the command line tools provided are insufficient for your needs.

Page 4: Downloading Tetrad Plus a Quick Introduction to Non-Gaussian Orientation Joseph Ramsey 1

Java

• The source code is in Java, which can be interfaced with several other platforms with a bit of work.• Matlab, R, Mathematica, also can be called from the command

line programmatically from various languages.• Also, since it’s in Java, it’s cross-platform compatible, so it will

run on your machine (so long as it’s a recent version of Windows, Mac, Linux, or Solaris).

Page 5: Downloading Tetrad Plus a Quick Introduction to Non-Gaussian Orientation Joseph Ramsey 1

Command Line Tetrad

• Jeremy Espino will talk tomorrow about software that will be available through the Center for Causal Discovery.

• However, some people have asked about command line tools for Tetrad.

• We have an unsophisticated command-line tool for several of the main algorithms, that has proven useful to people in the past.

Page 6: Downloading Tetrad Plus a Quick Introduction to Non-Gaussian Orientation Joseph Ramsey 1

How to get the command-line tool

• Go to the Tetrad downloads directory,

http://www.phil.cmu.edu/projects/tetrad_download/download/

• Look for files beginning with the prefix “tetradcmd-”.• Pick the one with the latest version number.

Page 7: Downloading Tetrad Plus a Quick Introduction to Non-Gaussian Orientation Joseph Ramsey 1

How to run a search at the command line...

Example:

java -jar tetradcmd-5.1.0-10.jar -data munin1.txt -datatype discrete –algorithm pc -depth 3 -significance 0.05

Page 8: Downloading Tetrad Plus a Quick Introduction to Non-Gaussian Orientation Joseph Ramsey 1

Command line options

• -data: Gives the data file• -datatype: continuous or discrete (mixed not supported)• -algorithm: pc, cpc, fci, cfci, ccd, ges• -depth: Default is -1 (unlimited)• -significance: Default is 0.05• ... Some others—send me email and I’ll send you the man

page.

Page 9: Downloading Tetrad Plus a Quick Introduction to Non-Gaussian Orientation Joseph Ramsey 1

IMaGES command line

• IMaGES (which Clark mentioned) uses its own command line interface.

• Email me if you’d like to use it:

[email protected]

Page 10: Downloading Tetrad Plus a Quick Introduction to Non-Gaussian Orientation Joseph Ramsey 1

But again…• Center for Causal Discovery is developing a set of algorithmic

tools that it will release separately.• These will include more scalable and accurate versions of

several of the Tetrad algorithms.• Jeremy Espino will talk about these.

10

Page 11: Downloading Tetrad Plus a Quick Introduction to Non-Gaussian Orientation Joseph Ramsey 1

While you’re listening…• Download the Tetrad session lingam.tet from Richard’s

download directory.• http://www.phil.cmu.edu/projects/tetrad_download/

download/workshop/Data/• We’ll use it.• I want to go into more detail for some of the algorithms Clark

mentioned in his talk and do demos of them.

11

Page 12: Downloading Tetrad Plus a Quick Introduction to Non-Gaussian Orientation Joseph Ramsey 1

Quick Review

12

If the DAG is this… PC yields this…with 3 ambiguously oriented

edges

PC and GES yield patterns.

Page 13: Downloading Tetrad Plus a Quick Introduction to Non-Gaussian Orientation Joseph Ramsey 1

Why?• PC makes all of its decisions about adjacency and orientation

based on judgments of conditional independency.• If the data are generated by a linear model with Gaussian

errors, PC’s pattern is the best that you can do (as Richard explained).

• But what if you relax these assumptions?• In some cases, you can do better.

• Not with algorithms that rely on conditional independence alone though—there PC is the best you can do! Doesn’t matter how you calculate independencies.

13

Page 14: Downloading Tetrad Plus a Quick Introduction to Non-Gaussian Orientation Joseph Ramsey 1

LiNGAM• LiNGAM = “Linear Non-Gaussian Acyclic Model”• Clark talked about this briefly; I’ll give some more detail. Clark

pointed out that the assumption that errors are Gaussian is replaced by the assumption that the errors are non-Gaussian (not bell-shaped curves).

• Acyclicity is still assumed, as is linearity.• Under these assumptions, the original DAG can be recovered.

14

Page 15: Downloading Tetrad Plus a Quick Introduction to Non-Gaussian Orientation Joseph Ramsey 1

In other words

15

PC LiNGAM

Page 16: Downloading Tetrad Plus a Quick Introduction to Non-Gaussian Orientation Joseph Ramsey 1

How?!• There are various ways to do it; I will regale one of these.• The ICA algorithm (Independent Components Analysis) tries to

solve the cocktail party problem—i.e. to figure out what voices are speaking from microphones placed around the room.

16

Page 17: Downloading Tetrad Plus a Quick Introduction to Non-Gaussian Orientation Joseph Ramsey 1

17

The voices are the independent components; the microphones get weighted sums of the independent components as input.

Page 18: Downloading Tetrad Plus a Quick Introduction to Non-Gaussian Orientation Joseph Ramsey 1

How?!• It turns out that if the independent components are distributed

non-Gaussianly (or all but one is distributed non-Gaussianly), then this problem can be solved.

• You can infer from the microphone data back to the voices!

18

Page 19: Downloading Tetrad Plus a Quick Introduction to Non-Gaussian Orientation Joseph Ramsey 1

But why? And stop dodging the question!• The Central Limit Theorem states the in the limit, sums of i.i.d.

non-Gaussian variables will converge to Gaussian.• The short-run effects of this are not analytic, but in general,

the sum of two non-Gaussian variables will be more Gaussian than either one of them, and the sum of three, etc.

• So the errors (which are assumed to be i.i.d.) have to be the most non-Gaussian variables in the SEM IM out of all variables that are descendants of them!

19

Page 20: Downloading Tetrad Plus a Quick Introduction to Non-Gaussian Orientation Joseph Ramsey 1

Now recall what a SEM IM looks like…

20

X1 = E_X1X2 = E_X2X3 = a1 * X1 + a2 * X2 + E_X3 = a1 * E_X1 + a2 * E_X2 + E_X3

So by CLT X3 should be more Gaussian than E_X1 or E_X2.ICA finds the linear combinations that maximize non-Gaussianity of the residuals.

Page 21: Downloading Tetrad Plus a Quick Introduction to Non-Gaussian Orientation Joseph Ramsey 1

Associating variables with errors• This method loses the information about which error is for

which variable.• But the coefficient matrix for a DAG must be a lower triangle,

in a causal ordering of the variables.• So we find the matrix of coefficients, and then permute the

order of the variables until a lower triangular matrix is found.

21

Page 22: Downloading Tetrad Plus a Quick Introduction to Non-Gaussian Orientation Joseph Ramsey 1

But does it work? Tetrad demo• Open the session lingam.tet in Tetrad (or follow along).• Look at the model on the left.

• Notice that the errors have been set to have very non-Gaussian distributions (U(0, 1)).

• Run PC and LiNGAM; which one is better?• Now look at the model in the middle.

• The errors have been set to Normal(0, 1).• Now which of PC or LiNGAM is better?

22

Page 23: Downloading Tetrad Plus a Quick Introduction to Non-Gaussian Orientation Joseph Ramsey 1

You can make new examples!• Right click on Graph2 and select “Edit Parameters…”• Set “Create Cyclic Graph” to “False”• Right click on Graph2 and select “Propagate Changes

Downstream”

23

Page 24: Downloading Tetrad Plus a Quick Introduction to Non-Gaussian Orientation Joseph Ramsey 1

What if you have some cycles?• In principle, the adjacencies of PC should be a superset of the

adjacencies of a cyclic model.• There are pairwise methods that can take edges adjacent in

even cyclic models and orient them.• So PC and a pairwise method can be combined: Run the PC

adjacency search, and orient each edge using a pairwise procedure.

• The risk is possibly including too many edges in the graph.

24

Page 25: Downloading Tetrad Plus a Quick Introduction to Non-Gaussian Orientation Joseph Ramsey 1

Note• Pairwise orientation for linear, Gaussian models is impossible,

or for that matter, pairwise orientation for any method that relies on conditional independence alone.

• Thus, pairwise orientation requires that variables be non-Gaussian, or that connection functions be non-linear, or both.

• Clark mentioned R1, R2, R3, R4, etc.

25

Page 26: Downloading Tetrad Plus a Quick Introduction to Non-Gaussian Orientation Joseph Ramsey 1

R3: A fact about entropy.

• Define NG(X) := D(P(X) || G(X)) := ½ ln var(X) + c – H(X). (Kullback-Leibler distance.)

• Standardize X and Y.• Assume:

• Y=aX+E, where E is independent of X• X=bY+E*, where E* is not independent of Y

• Then NG(X) + NG(Y,X) > NG(Y) + NG(X, Y)• We use the Anderson Darling to estimate NG.• Pairwise!

Page 27: Downloading Tetrad Plus a Quick Introduction to Non-Gaussian Orientation Joseph Ramsey 1

Tanh, Skew, Rskew

• Due to Hyvarinen and Smith (2013).• Approximations to • R = 1/T log L(X->Y) – 1/T log L(Y->X)

using the LiNGAM model likelihood.• Likelihood estimated as ρ E{g(X)Y – X g(Y)}• For g1(X) = -tanh(X), g2(X) = X2,

g3(X) = log(cosh(max(X, 0))).• When R > 0 orient X->Y; otherwise, Y->X.• Also pairwise!

Page 28: Downloading Tetrad Plus a Quick Introduction to Non-Gaussian Orientation Joseph Ramsey 1

Does it work? Tetrad Demo• Consider the rightmost model in lingam.tet.• Examine PC and LOFS, and then LiNGAM. Which is best?• Again, you can make a new example!

• Right click on Graph2 and select “Edit Parameters…”• Set “Create Cyclic Graph” to “True”• Right click on Graph2 and select “Propagate Changes

Downstream”

28

Page 29: Downloading Tetrad Plus a Quick Introduction to Non-Gaussian Orientation Joseph Ramsey 1

Choice of pairwise method• Notice in LOFS that the method has been set to R3.• Try setting the method to Skew or RSkew and click Search.• The result for this task is usually a bit worse, though with fMRI

data Skew and Rskew are competitive or better.

29

Page 30: Downloading Tetrad Plus a Quick Introduction to Non-Gaussian Orientation Joseph Ramsey 1

2-cycles• The method R4 in the LOFS box can actually detect 2-cycles in the

model, for small models, with large samples, fairly reliably.• Procedure : for each edge X—Y in S, pick endpoints for X and Y.

• If NG(eX|Y) > NG(X)• Set the endpoint of E at X to ARROW

• Else • Set the endpoint of E at X to TAIL

• If NG(eY|X) > NG(Y)• Set the endpoint of E at Y to ARROW

• Else • Set the endpoint of E at Y to TAIL

• In simple cases, X->Y means X is a cause of Y and X<->Y or X---Y means X Y. 30

Page 31: Downloading Tetrad Plus a Quick Introduction to Non-Gaussian Orientation Joseph Ramsey 1

Demo• Consider the model at the bottom of the lingam.tet session.• I’ve added a two cycle to the graph and selected R4 in the

LOFS box.• The two cycle is recovered in the LOFS box.• You can right click on the Generalized SEM PM box and select

“Simulate” to run it again with new data.

31

Page 32: Downloading Tetrad Plus a Quick Introduction to Non-Gaussian Orientation Joseph Ramsey 1

Thanks!

32