TRANSFER ENTROPY - as a causality indicator...BAMC 2012 March 27, 2012 Outline Causality measures...

Preview:

Citation preview

TRANSFER ENTROPYas a causality indicator

Fatimah Abdul Razak1,2 Prof. Henrik Jeldtoft Jensen1

1Complexity and Networks Group& Department of Mathematics

Imperial College London

2School of Mathematical SciencesUniversiti Kebangsaan Malaysia(National University of Malaysia)

BAMC 2012March 27, 2012

Outline

Causality measures

Transfer Entropy

Simple model

General model

Some preliminary results from data

Take home message

Fatimah A. Razak (Imperial College) fa708@imperial.ac.uk BAMC 2012 2 / 35

Causality MeasureThe Idea

”If we can measure degrees of causality.....We can thenobserve how much a change in one aspect of the universewill bring out changes in others. ”

”.. I was forced to consider the theory of information,and above all, that partial information which our knowledgeof one part of the system gives us the rest of it.”

- Norbert Wiener, I Am A Mathematician (1956)

Variable A ’causes’ variable B if the ability to predict B is improved byincorporating information about A in the prediction of B.

Fatimah A. Razak (Imperial College) fa708@imperial.ac.uk BAMC 2012 3 / 35

Granger Causality (G-Causality)An overview

Say we want to know if variable Y Granger causes variable X .

Fit variables X and Y into a model (usually Autoregressive) withsome method (usually standard linear regression method).

Test the model fit using a criterion eg. Bayesian Info Criterion.

Predict Xt+1 using the only past terms of X

Predict Xt+1 using past terms of X and Y as well

Determine if the latter is significantly better than the former (usuallythis is done using the F test)

If true then Y Granger causes X .

Fatimah A. Razak (Imperial College) fa708@imperial.ac.uk BAMC 2012 4 / 35

G-causalitythe caveat

First introduced by Clive Granger (1969) in the context of linearautoregressive (AR) model, G-causality is the most commonly used’causality’ indicator. It is known to

depend on how good the model fits, which is not trivial.

capture linear features only.

Thus, we seek to find a model independent method that can capturenon-linearity, to this end, we turn our attention to Markov Processes.

Fatimah A. Razak (Imperial College) fa708@imperial.ac.uk BAMC 2012 5 / 35

Markov Processes

A Markov process is a random process that retains no memory of where ithas been in the past. If X is Markov process with possible valuesxi , i = 1...n, then

P(Xn+1 = xn+1|X1 = x1, ..., Xn = xn) = P(Xn+1 = xn+1|Xn = xn).

This property is called the Markov property.

Fatimah A. Razak (Imperial College) fa708@imperial.ac.uk BAMC 2012 6 / 35

Markov Processes

A Markov process is a random process that retains no memory of where ithas been in the past. If X is Markov process with possible valuesxi , i = 1...n, then

P(Xn+1 = xn+1|X1 = x1, ..., Xn = xn) = P(Xn+1 = xn+1|Xn = xn).

This property is called the Markov property. Similarly, a Markov process oforder k is defined as a random process that retains memory of only k stepsin the past so that

P(Xn + 1 = xn+1|X1 = x1, ..., Xn = xn)

= P(Xn+1 = xn+1|Xn−k = xn−k , ..., Xn = xn).

Fatimah A. Razak (Imperial College) fa708@imperial.ac.uk BAMC 2012 6 / 35

Markov Processes

A Markov process is a random process that retains no memory of where ithas been in the past. If X is Markov process with possible valuesxi , i = 1...n, then

P(Xn+1 = xn+1|X1 = x1, ..., Xn = xn) = P(Xn+1 = xn+1|Xn = xn).

This property is called the Markov property. Similarly, a Markov process oforder k is defined as a random process that retains memory of only k stepsin the past so that

P(Xn + 1 = xn+1|X1 = x1, ..., Xn = xn)

= P(Xn+1 = xn+1|Xn−k = xn−k , ..., Xn = xn).

Let the X(k)n = (Xn−k = xn−k , ..., Xn = xn), so that the Markov property

can be written as P(Xn+1 = xn+1|X(n−1)n ) = P(Xn+1 = xn+1|X

(k)n ).

Fatimah A. Razak (Imperial College) fa708@imperial.ac.uk BAMC 2012 6 / 35

Transfer Entropy

The idea is to quantify this ratio (where X is a Markov process of order kand Y is a Markov process of order l )

rX

(k)n ,Y

(l)n

=P(Xn+1 = xn+1|X

(k)n , Y

(l)n )

P(Xn+1 = xn+1|X(k)n )

,

to see whether conditioning on past values of Y will effect prediction of Xin line with Wiener’s definition.

The Transfer Entropy of Y to X , TEY→X is defined as

TEY→X =∑

xn+1

xn

yn

P(Xn+1 = xn+1, Y(l)n , X

(k)n ) log r

X(k)n ,Y

(l)n

where 0 log 0 = 0. This measure was first introduced by T. Schreiber, inhis paper Measuring Information Transfer, PRL (2000).

Fatimah A. Razak (Imperial College) fa708@imperial.ac.uk BAMC 2012 7 / 35

Transfer EntropySimplest form

Most of the current applications of Transfer Entropy utilizes large values ofk and l but Schreiber himself warns against conditioning on too manyvariables.

When k = 1 and l = 1, we get

rXn,Yn =P(Xn+1 = xn+1|Xn = xn, Yn = yn)

P(Xn+1 = xn+1|Xn = xn),

so that the Transfer Entropy now becomes

TEY→X =∑

xn+1

xn

yn

P(Xn+1 = xn+1, Xn = xn, Yn = yn) log rXn,Yn .

Fatimah A. Razak (Imperial College) fa708@imperial.ac.uk BAMC 2012 8 / 35

Transfer EntropyDetecting time lags

Note that the calculation will be just a simple if we defined

r(τ)Xn,Yn−τ

=P(Xn+1 = xn+1|Xn = xn, Yn−τ = yn−τ )

P(Xn+1 = xn+1|Xn = xn),

since we now just conditioning with Y , τ time steps before n.

Transfer Entropy now becomes

TE(τ)Y→X =

xn+1

xn

yn−τ

P(Xn+1 = xn+1, Xn = xn, Yn−τ = yn−τ )log r(τ)Xn,Yn−τ

.

Fatimah A. Razak (Imperial College) fa708@imperial.ac.uk BAMC 2012 9 / 35

Simple Model

Let X , Y and Z be stochastic processes that can assume values inthe set of states A = {−1, 1} at every time step n = 1, ..., T .

Let variables X , Y and Z change at every time step independently ofeach other with probability 1

2 .

We impose a restriction so that X and Y can only change states ateach time step n if Z fulfills a certain condition at time step n − tZ ,thus creating some sort of ’dependence’ on Z .

Here, let the condition be that Zn−tZ = 1.

Let Q = P(’condition is fulfilled’) = P(Zn−tZ = 1).

In this way, we can say that Z ’causes’ X and Y to a certain extendwhich is controlled by Q.

Fatimah A. Razak (Imperial College) fa708@imperial.ac.uk BAMC 2012 10 / 35

Simple model, T = 11, tZ = 1Simulation

1 2 3 4 5 6 7 8 9 10 11

-1

0

1

X

1 2 3 4 5 6 7 8 9 10 11

-1

0

1

Y

1 2 3 4 5 6 7 8 9 10 11

-1

0

1

Z

Fatimah A. Razak (Imperial College) fa708@imperial.ac.uk BAMC 2012 11 / 35

Simple model, T = 200, tZ = 1Simulation

0 50 100 150 200

-1

0

1

X

0 50 100 150 200

-1

0

1

Y

0 50 100 150 200

-1

0

1

Z

Fatimah A. Razak (Imperial College) fa708@imperial.ac.uk BAMC 2012 12 / 35

Simple modelTransition Probabilities where Q = P(Zn−tZ = 1) = 1

2(still unknown)

When α, β ∈ {−1, 1}, then

P(Xn = α|Xn−1 = β) =

{1 − Q

2 = 34 , if α = β

Q2 = 1

4 , if α 6= β

P(Yn = α|Yn−1 = β) =

{1 − Q

2 = 34 , if α = β

Q2 = 1

4 , if α 6= β

and

P(Zn = α|Zn−1 = β) =

{12 , if α = β12 , if α 6= β.

Fatimah A. Razak (Imperial College) fa708@imperial.ac.uk BAMC 2012 13 / 35

Simple modelConditioning on Zn−tZ , ’the cause’.

Given that Zn−tZ = −1, we get Q = P(Zn−tZ = 1) = 0 and

P(Xn = α|Xn−1 = β, Zn−tZ = −1) =

{1 − Q

2 = 1, if α = βQ2 = 0, if α 6= β

P(Yn = α|Yn−1 = β, Zn−tZ = −1) =

{1 − Q

2 = 1, if α = βQ2 = 0, if α 6= β.

and notice that there is no effect on

P(Zn = α|Zn−1 = β, Zn−tZ = −1) =

{12 , if α = β12 , if α 6= β.

Fatimah A. Razak (Imperial College) fa708@imperial.ac.uk BAMC 2012 14 / 35

Simple modelConditioning on Zn−tZ , ’the cause’.

Given that Zn−tZ = 1, we get Q = P(Zn−tZ = 1) = 1 and

P(Xn = α|Xn−1 = β, Zn−tZ = −1) =

{1 − Q

2 = 12 , if α = β

Q2 = 1

2 , if α 6= β

P(Yn = α|Yn−1 = β, Zn−tZ = −1) =

{1 − Q

2 = 12 , if α = β

Q2 = 1

2 , if α 6= β.

and notice that there is no effect on

P(Zn = α|Zn−1 = β, Zn−tZ = −1) =

{12 , if α = β12 , if α 6= β.

Fatimah A. Razak (Imperial College) fa708@imperial.ac.uk BAMC 2012 15 / 35

Simple model probabilitiesr(τ)Xn,Zn−τ

& r(τ)Yn,Zn−τ

P(Xn = α|Xn−1 = β, Zn−tZ = −1)

P(Xn = α|Xn−1 = β)=

{1

3/4 = 43 , if α = β

0, if α 6= β,

P(Yn = α|Yn−1 = β, Zn−tZ = −1)

P(Yn = α|Yn−1 = β)=

{1

3/4 = 43 , if α = β

0, if α 6= β,

P(Xn = α|Xn−1 = β, Zn−tZ = 1)

P(Xn = α|Xn−1 = β)=

{1/23/4 = 2

3 , if α = β1/21/4 = 2, if α 6= β,

and

P(Yn = α|Yn−1 = β, Zn−tZ = 1)

P(Yn = α|Yn−1 = β)=

{1/23/4 = 2

3 , if α = β1/21/4 = 2, if α 6= β.

Fatimah A. Razak (Imperial College) fa708@imperial.ac.uk BAMC 2012 16 / 35

Transfer Entropy (TE) of the simple model

The previous probabilities were in the form of

r(τ)Xn,Zn−τ

=P(Xn = xn|Xn−1 = xn−1, Zn−τ = zn−τ )

P(Xn = xn|Xn−1 = xn−1)

to get

TE(τ)Z→X =

xn

xn−1

yn−τ

P(Xn = xn, Xn−1 = xn−1, Zn−τ = zn−τ ) log r(τ)Xn,Zn−τ

.

Fatimah A. Razak (Imperial College) fa708@imperial.ac.uk BAMC 2012 17 / 35

Transfer Entropy (TE) of the simple model

The previous probabilities were in the form of

r(τ)Xn,Zn−τ

=P(Xn = xn|Xn−1 = xn−1, Zn−τ = zn−τ )

P(Xn = xn|Xn−1 = xn−1)

to get

TE(τ)Z→X =

xn

xn−1

yn−τ

P(Xn = xn, Xn−1 = xn−1, Zn−τ = zn−τ ) log r(τ)Xn,Zn−τ

.

Note that r(τ)Xn,Yn−τ

= r(τ)Zn,Xn−τ

= r(τ)Zn,Yn−τ

= r(τ)Yn,Xn−τ

= 1. Therefore we

know that TE(τ)Y→X = TE

(τ)X→Z = TE

(τ)Y→Z = TE

(τ)X→Y = 0.

Fatimah A. Razak (Imperial College) fa708@imperial.ac.uk BAMC 2012 17 / 35

Transfer Entropy (TE) of the simple model

The previous probabilities were in the form of

r(τ)Xn,Zn−τ

=P(Xn = xn|Xn−1 = xn−1, Zn−τ = zn−τ )

P(Xn = xn|Xn−1 = xn−1)

to get

TE(τ)Z→X =

xn

xn−1

yn−τ

P(Xn = xn, Xn−1 = xn−1, Zn−τ = zn−τ ) log r(τ)Xn,Zn−τ

.

Note that r(τ)Xn,Yn−τ

= r(τ)Zn,Xn−τ

= r(τ)Zn,Yn−τ

= r(τ)Yn,Xn−τ

= 1. Therefore we

know that TE(τ)Y→X = TE

(τ)X→Z = TE

(τ)Y→Z = TE

(τ)X→Y = 0.

For τ 6= tZ , all the TE values are equal to 0. In this way, we can detectthe actual time lag tz since the only non-zero values are

TE(tz )Z→X = TE

(tz )Z→Y =

3

2log 2 −

3

4log 3 6= 0.

Fatimah A. Razak (Imperial College) fa708@imperial.ac.uk BAMC 2012 17 / 35

A more General Model

As in the simple model let X , Y and Z be stochastic processes thatcan assume values in the set of states A at every time stepn = 1, ..., T .

BUT now define ns as the number of states we have in the model,and define A = {1, ..., ns} as the set of possible states.

Let variables X , Y and Z change at every time step independently ofeach other with uniform probability 1

ns.

Similar to the simple model, we impose a restriction so that X and Ycan only change states at time n if Z fulfills a certain condition attime step n − tZ .

Let Q = P(’condition is fulfilled’). The analytical values can beworked out as in the simple model.

Fatimah A. Razak (Imperial College) fa708@imperial.ac.uk BAMC 2012 18 / 35

General model Case 1: Q = P(Zn−1 = 1)Simulation tZ = 1

0 10 20 30 40 500

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

TEzx vs ns: Case 1

ns

TE

zx

analyticssimulation T=1000simulation T=10000simulation T=100000

Fatimah A. Razak (Imperial College) fa708@imperial.ac.uk BAMC 2012 19 / 35

General model Case 2: Q = P(Zn−1 6= 1)Simulation tZ = 1

0 10 20 30 40 500

0.5

1

1.5

2

2.5

3

TEzx vs ns: Case 2

ns

TE

zx

analyticssimulation T=1000simulation T=10000simulation T=100000

Fatimah A. Razak (Imperial College) fa708@imperial.ac.uk BAMC 2012 20 / 35

General model Case 3: Q ' 12

Simulation tZ = 1

0 10 20 30 40 500

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

TEzx vs ns: Case 3

ns

TE

zx

analyticssimulation T=1000simulation T=10000simulation T=100000

Fatimah A. Razak (Imperial College) fa708@imperial.ac.uk BAMC 2012 21 / 35

Independent case: Null modelSimulation

0 10 20 30 40 500

0.5

1

1.5

2

2.5

3Null model

ns

TE

zx

analyticssimulation T=1Ksimulation T=10Ksimulation T=100Ksimulation T=1M

Fatimah A. Razak (Imperial College) fa708@imperial.ac.uk BAMC 2012 22 / 35

Independent case: Null modelSimulation,

0 5 10 15 20 25 30 35 40 45 5010

-6

10-5

10-4

10-3

10-2

10-1

100

101

Null model

ns

TE

zx

analyticssimulation T=1Ksimulation T=10Ksimulation T=100Ksimulation T=1M

Fatimah A. Razak (Imperial College) fa708@imperial.ac.uk BAMC 2012 23 / 35

Some preliminary resultsHow this translates to data applications

When using large ns (for binning purposes), because of the existenceof these spurious non-zero Transfer Entropy, one might come to anincorrect conclusion when applying it on data sets.

For our application, we simply use histograms for binning in order toestimate the probabilities. We used ns = 20 in the coming graphswith T = 20000.

Fatimah A. Razak (Imperial College) fa708@imperial.ac.uk BAMC 2012 24 / 35

EEG(Electroencephalogram) Data

EEG recordings by Bjorn Cruts, of the piano player, David Dolan(Guildhall School of Music and Drama) and the listener, Prof. HenrikJensen. The pianist was to asked to play at varying levels of musicalinterpretation and improvisation namely:

Piece 1: Schubert Impromptu in G flat major Op.90 No.3, neutralmode, uninvolved

Piece 2: Schubert Impromptu in G flat major Op.90 No.3, fullyinvolved

Piece 3: Improvisation, polyphonic, intellectual exercise

Piece 4: Improvisation, polyphonic, emotional letting go

Fatimah A. Razak (Imperial College) fa708@imperial.ac.uk BAMC 2012 25 / 35

Parts of the brain

The EEG was recorded at 8 locations on both the player and the listener.These locations are left and right

frontal cortex : attention, planning, working memory and inhibition

central cortex : controlling movements

temporal cortex : processing of sounds, languages and multi sensoryintegration

parietal cortex : visual, spatial positioning and short term memory

The results that are displayed here is using Transfer Entropy with k = 1and l = 1.

Fatimah A. Razak (Imperial College) fa708@imperial.ac.uk BAMC 2012 26 / 35

The nodes

1 Right frontal cortex (RF)

2 Left frontal cortex (LF)

3 Right central cortex (RC)

4 Left central cortex (LC)

5 Right temporal cortex (RT)

6 Left temporal cortex (LT)

7 Right parietal cortex (RP)

8 Left parietal cortex (LP)

Fatimah A. Razak (Imperial College) fa708@imperial.ac.uk BAMC 2012 27 / 35

Transfer Entropy on player

CAUSAL nodes

PIECE1

2 4 6 8

1

2

3

4

5

6

7

8

CAUSAL nodes

PIECE2

2 4 6 8

1

2

3

4

5

6

7

8

CAUSAL nodes

PIECE3

2 4 6 8

1

2

3

4

5

6

7

8

CAUSAL nodes

PIECE4

2 4 6 8

1

2

3

4

5

6

7

8

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

Fatimah A. Razak (Imperial College) fa708@imperial.ac.uk BAMC 2012 28 / 35

Transfer Entropy on listener

CAUSAL nodes

PIECE1

2 4 6 8

1

2

3

4

5

6

7

80

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

CAUSAL nodes

PIECE2

2 4 6 8

1

2

3

4

5

6

7

80

0.01

0.02

0.03

0.04

0.05

0.06

CAUSAL nodes

PIECE3

2 4 6 8

1

2

3

4

5

6

7

80

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

CAUSAL nodes

PIECE4

2 4 6 8

1

2

3

4

5

6

7

80

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

Fatimah A. Razak (Imperial College) fa708@imperial.ac.uk BAMC 2012 29 / 35

The nodes

1 Player’s Right frontal cortex2 Player’s Left frontal cortex3 Player’s Right central cortex4 Player’s Left central cortex5 Player’s Right temporal cortex6 Player’s Left temporal cortex7 Player’s Right parietal cortex8 Player’s Left parietal cortex9 Listener’s Right frontal cortex10 Listener’s Left frontal cortex11 Listener’s Right central cortex12 Listener’s Left central cortex13 Listener’s Right temporal cortex14 Listener’s Left temporal cortex15 Listener’s Right parietal cortex16 Listener’s Left parietal cortex

Fatimah A. Razak (Imperial College) fa708@imperial.ac.uk BAMC 2012 30 / 35

Transfer entropy of both player and listener

CAUSAL nodes

PIECE1

5 10 15

2

4

6

8

10

12

14

160

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

CAUSAL nodes

PIECE2

5 10 15

2

4

6

8

10

12

14

160

0.02

0.04

0.06

0.08

0.1

CAUSAL nodes

PIECE3

5 10 15

2

4

6

8

10

12

14

160

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

CAUSAL nodes

PIECE4

5 10 15

2

4

6

8

10

12

14

160

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

Fatimah A. Razak (Imperial College) fa708@imperial.ac.uk BAMC 2012 31 / 35

How the listener is influenced by the playerzoom in lower left hand corner Transfer Entropy

CAUSAL nodes

PIECE1

2 4 6 8

9

10

11

12

13

14

15

160.02

0.03

0.04

0.05

0.06

0.07

0.08

CAUSAL nodes

PIECE2

2 4 6 8

9

10

11

12

13

14

15

160.02

0.03

0.04

0.05

0.06

CAUSAL nodes

PIECE3

2 4 6 8

9

10

11

12

13

14

15

160.02

0.03

0.04

0.05

0.06

0.07

0.08

CAUSAL nodes

PIECE4

2 4 6 8

9

10

11

12

13

14

15

160.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

Fatimah A. Razak (Imperial College) fa708@imperial.ac.uk BAMC 2012 32 / 35

How the player is influenced by the listenerzoom in upper right hand corner Transfer Entropy

CAUSAL nodes

PIECE1

10 12 14 16

1

2

3

4

5

6

7

8

0.02

0.03

0.04

0.05

0.06

0.07

CAUSAL nodes

PIECE2

10 12 14 16

1

2

3

4

5

6

7

80.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

CAUSAL nodes

PIECE3

10 12 14 16

1

2

3

4

5

6

7

8 0.015

0.02

0.025

0.03

0.035

0.04

0.045

0.05

0.055

CAUSAL nodes

PIECE4

10 12 14 16

1

2

3

4

5

6

7

80.02

0.04

0.06

0.08

0.1

0.12

0.14

Fatimah A. Razak (Imperial College) fa708@imperial.ac.uk BAMC 2012 33 / 35

Take home messageWhat you need to remember if nothing else

We investigated the simplest form of Transfer Entropy and have verified iton simplistic models where it does indeed detect ’causal’ interactions andalso has the potential to further detect time lags.

We recommend the use of this measure but caution the user of the pitfallsespecially in detecting causality in EEG data sets namely:

lack of statistics that might give rise to spurious non-zero TransferEntropy values.

(Schreiber himself warns) against conditioning on too many variablesand making conclusion about values of Transfer Entropy when it isnon-zero.

Transfer Entropy if applied correctly has the potential to be a veryaccurate and easily applicable ’causality’ measure.

Fatimah A. Razak (Imperial College) fa708@imperial.ac.uk BAMC 2012 34 / 35

References

T. Schreiber, Measuring Information Transfer, Phys. Rev. Letters 85 2(2000)

A. B. Barrett and A. K. Seth, Wiener-Granger Causality and TransferEntropy are equivalent for Gaussian variables, Phys. Rev. Letters 103238701 (2009)

S. L. Bressler and A. K. Seth, Wiener-Granger Causality: A wellestablish methodology, NeuroImage (2010)doi:10.1016/j.neuroimage.2010.02.059

S. Frenzel and Bernd Pompe, Partial Mutual Information for CouplingAnalysis of Multivariate Time Series, Phys. Rev. Lett. 99, 204101(2007)

T. M. Cover and J. A. Thomas, Elements of Information Theory,Wiley, New York (1999)

Fatimah A. Razak (Imperial College) fa708@imperial.ac.uk BAMC 2012 35 / 35

Recommended