Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
TRANSFER ENTROPYas a causality indicator
Fatimah Abdul Razak1,2 Prof. Henrik Jeldtoft Jensen1
1Complexity and Networks Group& Department of Mathematics
Imperial College London
2School of Mathematical SciencesUniversiti Kebangsaan Malaysia(National University of Malaysia)
BAMC 2012March 27, 2012
Outline
Causality measures
Transfer Entropy
Simple model
General model
Some preliminary results from data
Take home message
Fatimah A. Razak (Imperial College) [email protected] BAMC 2012 2 / 35
Causality MeasureThe Idea
”If we can measure degrees of causality.....We can thenobserve how much a change in one aspect of the universewill bring out changes in others. ”
”.. I was forced to consider the theory of information,and above all, that partial information which our knowledgeof one part of the system gives us the rest of it.”
- Norbert Wiener, I Am A Mathematician (1956)
Variable A ’causes’ variable B if the ability to predict B is improved byincorporating information about A in the prediction of B.
Fatimah A. Razak (Imperial College) [email protected] BAMC 2012 3 / 35
Granger Causality (G-Causality)An overview
Say we want to know if variable Y Granger causes variable X .
Fit variables X and Y into a model (usually Autoregressive) withsome method (usually standard linear regression method).
Test the model fit using a criterion eg. Bayesian Info Criterion.
Predict Xt+1 using the only past terms of X
Predict Xt+1 using past terms of X and Y as well
Determine if the latter is significantly better than the former (usuallythis is done using the F test)
If true then Y Granger causes X .
Fatimah A. Razak (Imperial College) [email protected] BAMC 2012 4 / 35
G-causalitythe caveat
First introduced by Clive Granger (1969) in the context of linearautoregressive (AR) model, G-causality is the most commonly used’causality’ indicator. It is known to
depend on how good the model fits, which is not trivial.
capture linear features only.
Thus, we seek to find a model independent method that can capturenon-linearity, to this end, we turn our attention to Markov Processes.
Fatimah A. Razak (Imperial College) [email protected] BAMC 2012 5 / 35
Markov Processes
A Markov process is a random process that retains no memory of where ithas been in the past. If X is Markov process with possible valuesxi , i = 1...n, then
P(Xn+1 = xn+1|X1 = x1, ..., Xn = xn) = P(Xn+1 = xn+1|Xn = xn).
This property is called the Markov property.
Fatimah A. Razak (Imperial College) [email protected] BAMC 2012 6 / 35
Markov Processes
A Markov process is a random process that retains no memory of where ithas been in the past. If X is Markov process with possible valuesxi , i = 1...n, then
P(Xn+1 = xn+1|X1 = x1, ..., Xn = xn) = P(Xn+1 = xn+1|Xn = xn).
This property is called the Markov property. Similarly, a Markov process oforder k is defined as a random process that retains memory of only k stepsin the past so that
P(Xn + 1 = xn+1|X1 = x1, ..., Xn = xn)
= P(Xn+1 = xn+1|Xn−k = xn−k , ..., Xn = xn).
Fatimah A. Razak (Imperial College) [email protected] BAMC 2012 6 / 35
Markov Processes
A Markov process is a random process that retains no memory of where ithas been in the past. If X is Markov process with possible valuesxi , i = 1...n, then
P(Xn+1 = xn+1|X1 = x1, ..., Xn = xn) = P(Xn+1 = xn+1|Xn = xn).
This property is called the Markov property. Similarly, a Markov process oforder k is defined as a random process that retains memory of only k stepsin the past so that
P(Xn + 1 = xn+1|X1 = x1, ..., Xn = xn)
= P(Xn+1 = xn+1|Xn−k = xn−k , ..., Xn = xn).
Let the X(k)n = (Xn−k = xn−k , ..., Xn = xn), so that the Markov property
can be written as P(Xn+1 = xn+1|X(n−1)n ) = P(Xn+1 = xn+1|X
(k)n ).
Fatimah A. Razak (Imperial College) [email protected] BAMC 2012 6 / 35
Transfer Entropy
The idea is to quantify this ratio (where X is a Markov process of order kand Y is a Markov process of order l )
rX
(k)n ,Y
(l)n
=P(Xn+1 = xn+1|X
(k)n , Y
(l)n )
P(Xn+1 = xn+1|X(k)n )
,
to see whether conditioning on past values of Y will effect prediction of Xin line with Wiener’s definition.
The Transfer Entropy of Y to X , TEY→X is defined as
TEY→X =∑
xn+1
∑
xn
∑
yn
P(Xn+1 = xn+1, Y(l)n , X
(k)n ) log r
X(k)n ,Y
(l)n
where 0 log 0 = 0. This measure was first introduced by T. Schreiber, inhis paper Measuring Information Transfer, PRL (2000).
Fatimah A. Razak (Imperial College) [email protected] BAMC 2012 7 / 35
Transfer EntropySimplest form
Most of the current applications of Transfer Entropy utilizes large values ofk and l but Schreiber himself warns against conditioning on too manyvariables.
When k = 1 and l = 1, we get
rXn,Yn =P(Xn+1 = xn+1|Xn = xn, Yn = yn)
P(Xn+1 = xn+1|Xn = xn),
so that the Transfer Entropy now becomes
TEY→X =∑
xn+1
∑
xn
∑
yn
P(Xn+1 = xn+1, Xn = xn, Yn = yn) log rXn,Yn .
Fatimah A. Razak (Imperial College) [email protected] BAMC 2012 8 / 35
Transfer EntropyDetecting time lags
Note that the calculation will be just a simple if we defined
r(τ)Xn,Yn−τ
=P(Xn+1 = xn+1|Xn = xn, Yn−τ = yn−τ )
P(Xn+1 = xn+1|Xn = xn),
since we now just conditioning with Y , τ time steps before n.
Transfer Entropy now becomes
TE(τ)Y→X =
∑
xn+1
∑
xn
∑
yn−τ
P(Xn+1 = xn+1, Xn = xn, Yn−τ = yn−τ )log r(τ)Xn,Yn−τ
.
Fatimah A. Razak (Imperial College) [email protected] BAMC 2012 9 / 35
Simple Model
Let X , Y and Z be stochastic processes that can assume values inthe set of states A = {−1, 1} at every time step n = 1, ..., T .
Let variables X , Y and Z change at every time step independently ofeach other with probability 1
2 .
We impose a restriction so that X and Y can only change states ateach time step n if Z fulfills a certain condition at time step n − tZ ,thus creating some sort of ’dependence’ on Z .
Here, let the condition be that Zn−tZ = 1.
Let Q = P(’condition is fulfilled’) = P(Zn−tZ = 1).
In this way, we can say that Z ’causes’ X and Y to a certain extendwhich is controlled by Q.
Fatimah A. Razak (Imperial College) [email protected] BAMC 2012 10 / 35
Simple model, T = 11, tZ = 1Simulation
1 2 3 4 5 6 7 8 9 10 11
-1
0
1
X
1 2 3 4 5 6 7 8 9 10 11
-1
0
1
Y
1 2 3 4 5 6 7 8 9 10 11
-1
0
1
Z
Fatimah A. Razak (Imperial College) [email protected] BAMC 2012 11 / 35
Simple model, T = 200, tZ = 1Simulation
0 50 100 150 200
-1
0
1
X
0 50 100 150 200
-1
0
1
Y
0 50 100 150 200
-1
0
1
Z
Fatimah A. Razak (Imperial College) [email protected] BAMC 2012 12 / 35
Simple modelTransition Probabilities where Q = P(Zn−tZ = 1) = 1
2(still unknown)
When α, β ∈ {−1, 1}, then
P(Xn = α|Xn−1 = β) =
{1 − Q
2 = 34 , if α = β
Q2 = 1
4 , if α 6= β
P(Yn = α|Yn−1 = β) =
{1 − Q
2 = 34 , if α = β
Q2 = 1
4 , if α 6= β
and
P(Zn = α|Zn−1 = β) =
{12 , if α = β12 , if α 6= β.
Fatimah A. Razak (Imperial College) [email protected] BAMC 2012 13 / 35
Simple modelConditioning on Zn−tZ , ’the cause’.
Given that Zn−tZ = −1, we get Q = P(Zn−tZ = 1) = 0 and
P(Xn = α|Xn−1 = β, Zn−tZ = −1) =
{1 − Q
2 = 1, if α = βQ2 = 0, if α 6= β
P(Yn = α|Yn−1 = β, Zn−tZ = −1) =
{1 − Q
2 = 1, if α = βQ2 = 0, if α 6= β.
and notice that there is no effect on
P(Zn = α|Zn−1 = β, Zn−tZ = −1) =
{12 , if α = β12 , if α 6= β.
Fatimah A. Razak (Imperial College) [email protected] BAMC 2012 14 / 35
Simple modelConditioning on Zn−tZ , ’the cause’.
Given that Zn−tZ = 1, we get Q = P(Zn−tZ = 1) = 1 and
P(Xn = α|Xn−1 = β, Zn−tZ = −1) =
{1 − Q
2 = 12 , if α = β
Q2 = 1
2 , if α 6= β
P(Yn = α|Yn−1 = β, Zn−tZ = −1) =
{1 − Q
2 = 12 , if α = β
Q2 = 1
2 , if α 6= β.
and notice that there is no effect on
P(Zn = α|Zn−1 = β, Zn−tZ = −1) =
{12 , if α = β12 , if α 6= β.
Fatimah A. Razak (Imperial College) [email protected] BAMC 2012 15 / 35
Simple model probabilitiesr(τ)Xn,Zn−τ
& r(τ)Yn,Zn−τ
P(Xn = α|Xn−1 = β, Zn−tZ = −1)
P(Xn = α|Xn−1 = β)=
{1
3/4 = 43 , if α = β
0, if α 6= β,
P(Yn = α|Yn−1 = β, Zn−tZ = −1)
P(Yn = α|Yn−1 = β)=
{1
3/4 = 43 , if α = β
0, if α 6= β,
P(Xn = α|Xn−1 = β, Zn−tZ = 1)
P(Xn = α|Xn−1 = β)=
{1/23/4 = 2
3 , if α = β1/21/4 = 2, if α 6= β,
and
P(Yn = α|Yn−1 = β, Zn−tZ = 1)
P(Yn = α|Yn−1 = β)=
{1/23/4 = 2
3 , if α = β1/21/4 = 2, if α 6= β.
Fatimah A. Razak (Imperial College) [email protected] BAMC 2012 16 / 35
Transfer Entropy (TE) of the simple model
The previous probabilities were in the form of
r(τ)Xn,Zn−τ
=P(Xn = xn|Xn−1 = xn−1, Zn−τ = zn−τ )
P(Xn = xn|Xn−1 = xn−1)
to get
TE(τ)Z→X =
∑
xn
∑
xn−1
∑
yn−τ
P(Xn = xn, Xn−1 = xn−1, Zn−τ = zn−τ ) log r(τ)Xn,Zn−τ
.
Fatimah A. Razak (Imperial College) [email protected] BAMC 2012 17 / 35
Transfer Entropy (TE) of the simple model
The previous probabilities were in the form of
r(τ)Xn,Zn−τ
=P(Xn = xn|Xn−1 = xn−1, Zn−τ = zn−τ )
P(Xn = xn|Xn−1 = xn−1)
to get
TE(τ)Z→X =
∑
xn
∑
xn−1
∑
yn−τ
P(Xn = xn, Xn−1 = xn−1, Zn−τ = zn−τ ) log r(τ)Xn,Zn−τ
.
Note that r(τ)Xn,Yn−τ
= r(τ)Zn,Xn−τ
= r(τ)Zn,Yn−τ
= r(τ)Yn,Xn−τ
= 1. Therefore we
know that TE(τ)Y→X = TE
(τ)X→Z = TE
(τ)Y→Z = TE
(τ)X→Y = 0.
Fatimah A. Razak (Imperial College) [email protected] BAMC 2012 17 / 35
Transfer Entropy (TE) of the simple model
The previous probabilities were in the form of
r(τ)Xn,Zn−τ
=P(Xn = xn|Xn−1 = xn−1, Zn−τ = zn−τ )
P(Xn = xn|Xn−1 = xn−1)
to get
TE(τ)Z→X =
∑
xn
∑
xn−1
∑
yn−τ
P(Xn = xn, Xn−1 = xn−1, Zn−τ = zn−τ ) log r(τ)Xn,Zn−τ
.
Note that r(τ)Xn,Yn−τ
= r(τ)Zn,Xn−τ
= r(τ)Zn,Yn−τ
= r(τ)Yn,Xn−τ
= 1. Therefore we
know that TE(τ)Y→X = TE
(τ)X→Z = TE
(τ)Y→Z = TE
(τ)X→Y = 0.
For τ 6= tZ , all the TE values are equal to 0. In this way, we can detectthe actual time lag tz since the only non-zero values are
TE(tz )Z→X = TE
(tz )Z→Y =
3
2log 2 −
3
4log 3 6= 0.
Fatimah A. Razak (Imperial College) [email protected] BAMC 2012 17 / 35
A more General Model
As in the simple model let X , Y and Z be stochastic processes thatcan assume values in the set of states A at every time stepn = 1, ..., T .
BUT now define ns as the number of states we have in the model,and define A = {1, ..., ns} as the set of possible states.
Let variables X , Y and Z change at every time step independently ofeach other with uniform probability 1
ns.
Similar to the simple model, we impose a restriction so that X and Ycan only change states at time n if Z fulfills a certain condition attime step n − tZ .
Let Q = P(’condition is fulfilled’). The analytical values can beworked out as in the simple model.
Fatimah A. Razak (Imperial College) [email protected] BAMC 2012 18 / 35
General model Case 1: Q = P(Zn−1 = 1)Simulation tZ = 1
0 10 20 30 40 500
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
TEzx vs ns: Case 1
ns
TE
zx
analyticssimulation T=1000simulation T=10000simulation T=100000
Fatimah A. Razak (Imperial College) [email protected] BAMC 2012 19 / 35
General model Case 2: Q = P(Zn−1 6= 1)Simulation tZ = 1
0 10 20 30 40 500
0.5
1
1.5
2
2.5
3
TEzx vs ns: Case 2
ns
TE
zx
analyticssimulation T=1000simulation T=10000simulation T=100000
Fatimah A. Razak (Imperial College) [email protected] BAMC 2012 20 / 35
General model Case 3: Q ' 12
Simulation tZ = 1
0 10 20 30 40 500
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
TEzx vs ns: Case 3
ns
TE
zx
analyticssimulation T=1000simulation T=10000simulation T=100000
Fatimah A. Razak (Imperial College) [email protected] BAMC 2012 21 / 35
Independent case: Null modelSimulation
0 10 20 30 40 500
0.5
1
1.5
2
2.5
3Null model
ns
TE
zx
analyticssimulation T=1Ksimulation T=10Ksimulation T=100Ksimulation T=1M
Fatimah A. Razak (Imperial College) [email protected] BAMC 2012 22 / 35
Independent case: Null modelSimulation,
0 5 10 15 20 25 30 35 40 45 5010
-6
10-5
10-4
10-3
10-2
10-1
100
101
Null model
ns
TE
zx
analyticssimulation T=1Ksimulation T=10Ksimulation T=100Ksimulation T=1M
Fatimah A. Razak (Imperial College) [email protected] BAMC 2012 23 / 35
Some preliminary resultsHow this translates to data applications
When using large ns (for binning purposes), because of the existenceof these spurious non-zero Transfer Entropy, one might come to anincorrect conclusion when applying it on data sets.
For our application, we simply use histograms for binning in order toestimate the probabilities. We used ns = 20 in the coming graphswith T = 20000.
Fatimah A. Razak (Imperial College) [email protected] BAMC 2012 24 / 35
EEG(Electroencephalogram) Data
EEG recordings by Bjorn Cruts, of the piano player, David Dolan(Guildhall School of Music and Drama) and the listener, Prof. HenrikJensen. The pianist was to asked to play at varying levels of musicalinterpretation and improvisation namely:
Piece 1: Schubert Impromptu in G flat major Op.90 No.3, neutralmode, uninvolved
Piece 2: Schubert Impromptu in G flat major Op.90 No.3, fullyinvolved
Piece 3: Improvisation, polyphonic, intellectual exercise
Piece 4: Improvisation, polyphonic, emotional letting go
Fatimah A. Razak (Imperial College) [email protected] BAMC 2012 25 / 35
Parts of the brain
The EEG was recorded at 8 locations on both the player and the listener.These locations are left and right
frontal cortex : attention, planning, working memory and inhibition
central cortex : controlling movements
temporal cortex : processing of sounds, languages and multi sensoryintegration
parietal cortex : visual, spatial positioning and short term memory
The results that are displayed here is using Transfer Entropy with k = 1and l = 1.
Fatimah A. Razak (Imperial College) [email protected] BAMC 2012 26 / 35
The nodes
1 Right frontal cortex (RF)
2 Left frontal cortex (LF)
3 Right central cortex (RC)
4 Left central cortex (LC)
5 Right temporal cortex (RT)
6 Left temporal cortex (LT)
7 Right parietal cortex (RP)
8 Left parietal cortex (LP)
Fatimah A. Razak (Imperial College) [email protected] BAMC 2012 27 / 35
Transfer Entropy on player
CAUSAL nodes
PIECE1
2 4 6 8
1
2
3
4
5
6
7
8
CAUSAL nodes
PIECE2
2 4 6 8
1
2
3
4
5
6
7
8
CAUSAL nodes
PIECE3
2 4 6 8
1
2
3
4
5
6
7
8
CAUSAL nodes
PIECE4
2 4 6 8
1
2
3
4
5
6
7
8
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Fatimah A. Razak (Imperial College) [email protected] BAMC 2012 28 / 35
Transfer Entropy on listener
CAUSAL nodes
PIECE1
2 4 6 8
1
2
3
4
5
6
7
80
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
CAUSAL nodes
PIECE2
2 4 6 8
1
2
3
4
5
6
7
80
0.01
0.02
0.03
0.04
0.05
0.06
CAUSAL nodes
PIECE3
2 4 6 8
1
2
3
4
5
6
7
80
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
CAUSAL nodes
PIECE4
2 4 6 8
1
2
3
4
5
6
7
80
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
Fatimah A. Razak (Imperial College) [email protected] BAMC 2012 29 / 35
The nodes
1 Player’s Right frontal cortex2 Player’s Left frontal cortex3 Player’s Right central cortex4 Player’s Left central cortex5 Player’s Right temporal cortex6 Player’s Left temporal cortex7 Player’s Right parietal cortex8 Player’s Left parietal cortex9 Listener’s Right frontal cortex10 Listener’s Left frontal cortex11 Listener’s Right central cortex12 Listener’s Left central cortex13 Listener’s Right temporal cortex14 Listener’s Left temporal cortex15 Listener’s Right parietal cortex16 Listener’s Left parietal cortex
Fatimah A. Razak (Imperial College) [email protected] BAMC 2012 30 / 35
Transfer entropy of both player and listener
CAUSAL nodes
PIECE1
5 10 15
2
4
6
8
10
12
14
160
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
CAUSAL nodes
PIECE2
5 10 15
2
4
6
8
10
12
14
160
0.02
0.04
0.06
0.08
0.1
CAUSAL nodes
PIECE3
5 10 15
2
4
6
8
10
12
14
160
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
CAUSAL nodes
PIECE4
5 10 15
2
4
6
8
10
12
14
160
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Fatimah A. Razak (Imperial College) [email protected] BAMC 2012 31 / 35
How the listener is influenced by the playerzoom in lower left hand corner Transfer Entropy
CAUSAL nodes
PIECE1
2 4 6 8
9
10
11
12
13
14
15
160.02
0.03
0.04
0.05
0.06
0.07
0.08
CAUSAL nodes
PIECE2
2 4 6 8
9
10
11
12
13
14
15
160.02
0.03
0.04
0.05
0.06
CAUSAL nodes
PIECE3
2 4 6 8
9
10
11
12
13
14
15
160.02
0.03
0.04
0.05
0.06
0.07
0.08
CAUSAL nodes
PIECE4
2 4 6 8
9
10
11
12
13
14
15
160.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
Fatimah A. Razak (Imperial College) [email protected] BAMC 2012 32 / 35
How the player is influenced by the listenerzoom in upper right hand corner Transfer Entropy
CAUSAL nodes
PIECE1
10 12 14 16
1
2
3
4
5
6
7
8
0.02
0.03
0.04
0.05
0.06
0.07
CAUSAL nodes
PIECE2
10 12 14 16
1
2
3
4
5
6
7
80.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
CAUSAL nodes
PIECE3
10 12 14 16
1
2
3
4
5
6
7
8 0.015
0.02
0.025
0.03
0.035
0.04
0.045
0.05
0.055
CAUSAL nodes
PIECE4
10 12 14 16
1
2
3
4
5
6
7
80.02
0.04
0.06
0.08
0.1
0.12
0.14
Fatimah A. Razak (Imperial College) [email protected] BAMC 2012 33 / 35
Take home messageWhat you need to remember if nothing else
We investigated the simplest form of Transfer Entropy and have verified iton simplistic models where it does indeed detect ’causal’ interactions andalso has the potential to further detect time lags.
We recommend the use of this measure but caution the user of the pitfallsespecially in detecting causality in EEG data sets namely:
lack of statistics that might give rise to spurious non-zero TransferEntropy values.
(Schreiber himself warns) against conditioning on too many variablesand making conclusion about values of Transfer Entropy when it isnon-zero.
Transfer Entropy if applied correctly has the potential to be a veryaccurate and easily applicable ’causality’ measure.
Fatimah A. Razak (Imperial College) [email protected] BAMC 2012 34 / 35
References
T. Schreiber, Measuring Information Transfer, Phys. Rev. Letters 85 2(2000)
A. B. Barrett and A. K. Seth, Wiener-Granger Causality and TransferEntropy are equivalent for Gaussian variables, Phys. Rev. Letters 103238701 (2009)
S. L. Bressler and A. K. Seth, Wiener-Granger Causality: A wellestablish methodology, NeuroImage (2010)doi:10.1016/j.neuroimage.2010.02.059
S. Frenzel and Bernd Pompe, Partial Mutual Information for CouplingAnalysis of Multivariate Time Series, Phys. Rev. Lett. 99, 204101(2007)
T. M. Cover and J. A. Thomas, Elements of Information Theory,Wiley, New York (1999)
Fatimah A. Razak (Imperial College) [email protected] BAMC 2012 35 / 35