Today: Entropy Information Theory. Claude Shannon Ph.D. 1916-2001

Preview:

Citation preview

Today:

Entropy

<break>

Information Theory

Information Theory

Claude Shannon Ph.D.1916-2001

I x;y( ) = H x( ) − H x | y( )

I x;y( ) = H x( ) − H x | y( )

H x( ) = − p x i( )log2 p x i( )[ ]∑

Entropy

H x( ) = p x i( )log2 p x i( )[ ]∑

Entropy

A measure of the disorder in a system

EntropyThe (average) number of yes/no

questions needed to completely specify the state of a

system

The (average) number of yes/no questions needed

to completely specify the state of a system

What if there were two coins?

What if there were two coins?

What if there were two coins?

What if there were two coins?

2 states. 1 question.

4 states. 2 questions.

8 states. 3 questions.

16 states. 4 questions.

number of states = 2

number of yes-no questions

number of states = 2

number of yes-no questions

log2(number of states) =

number of yes-no questions

H = log2 n[ ]

H is entropy, the number of yes-no questions required to specify the state of the system

n is the number of states of the system, assumed (for now) to be equally likely

H = log2 n[ ]

Consider Dice

The Six Sided Die

H = log2(6) = 2.585 bits

The Four Sided Die

H = log2(4) = 2.000 bits

The Twenty Sided Die

H = log2(20) = 4.322 bits

What about all three dice?

H = log2(4620)

What about all three dice?

H = log2(4)+log2(6)+log2(20)

What about all three dice?

H = 8.907 bits

What about all three dice?

Entropy, from independent elements of a system, adds

H = log2 n[ ]

Let’s the rewrite this a bit...Trivial Fact 1: log2(x) = - log2(1/x)

H = −log2

1

n

⎡ ⎣ ⎢

⎤ ⎦ ⎥

Trivial Fact 1: log2(x) = - log2(1/x)

Trivial Fact 2:if there are n equally likely

possibilites p = (1/n)

H = −log2 p[ ]

Trivial Fact 2:if there are n equally likely

possibilites p = (1/n)

H = −log2 p[ ]

H = −log2 p[ ]

What if the n statesare not equally

probable?Maybe we should use the

expected value of the entropies,a weighted average by probability

H = − pi log2 pi[ ]i=1

n

Let’s do a simple example:n = 2 , how does H change as we

vary p1 and p2 ?

H = − pi log2 pi[ ]i=1

n

n = 2

p1 + p2 = 1

how about n = 3€

H = − pi log2 pi[ ]i=1

n

n = 3

p1 + p2 + p3 = 1

The bottom line intuitions for Entropy:

• Entropy is a statistic for describing a probability distribution.

• Probabilities distributions which are flat, broad, sparse, etc. have HIGH entropy.

• Probability distributions which are peaked, sharp, narrow, compact etc. have LOW entropy.

• Entropy adds for independent elements of a system, thus entropy grows with the dimensionality of the probability distribution.

• Entropy is zero IFF the system is in a definite state, i.e. p = 1 somewhere and 0 everywhere else.

Pop Quiz:

1. 2.

3. 4.

EntropyThe (average) number of yes/no

questions needed to completely specify the state of a

system

11:16 am (Pacific) on June 29th of the year 2001,

there were approximately 816,119 words in the English Language

H(english) = 19.6 bits

Twenty Questions:220 = 1,048,576

What’s a winning 20 Questions Strategy?

<break>

I x;y( ) = H x( ) − H x | y( )

So, what is information?

It’s a change in what you don’t know.

It’s a change in the entropy.

x yInformation as a measure of correlation

x yInformation as a measure of correlation

heads tails

pro

bab

ilit

y

0

1

1/2

P(Y)

I (X;Y) = H(Y) - H(Y|X) = 0 bits

heads tails

pro

bab

ilit

y

0

1

1/2

P(Y|x=heads )H(Y|x=heads) = 1H(Y) = 1

x yInformation as a measure of correlation

x yInformation as a measure of correlation

heads tails

pro

bab

ilit

y

0

1

1/2

P(Y)

I (X;Y) = H(Y) - H(Y|X) ~ 1 bit

heads tails

pro

bab

ilit

y

0

1

1/2

P(Y|x=heads )H(Y|x=heads) ~ 0H(Y) = 1

x yInformation Theory in Neuroscience

The Critical Observation:

Information is Mutual

I(X;Y) = I(Y;X)

H(Y)-H(Y|X) = H(X)-H(X|Y)

The Critical Observation:

What a spike tells the Brain about the stimulus,

is the same as what our stimulus choice tells us about the likelihood

of a spike.

I(Stimulus;Spike) = I(Spike;Stimulus)

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

The Critical Observation:

What our stimulus choice tells us about the likelihood of a spike.

stimulus response

This, we can measure....

Show your system stimuli.Measure neural

responses.P( neural response | stimulus presented )Estimate: P( neural repsones )From that,

Estimate:Compute: H(neural response) and H(neural response | stimulus presented)

Calculate: I(response ; stimulus)

How to use Information Theory:

Choose stimuli which are not representative.Measure the “wrong” aspect of the response.Don’t take enough data to estimate P( ) well. Use a crappy method of computing H( ).Calculate I( ) and report it without comparing it to anything...

How to screw it up:

Here’s an example of Information Theory applied appropriately

Temporal Coding of Visual Information in the Thalamus Pamela Reinagel and R. Clay Reid

J. Neurosci. 20(14):5392-5400. (2000)

LGN responses are very reliable.

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Is there information in thetemporal pattern of spikes?

x yPatterns of Spikes in the LGN

…….0……..…….1……..

spikes

x yPatterns of Spikes in the LGN

…….00……..…….10…….. …….01……..…….11……..

spikes

x yPatterns of Spikes in the LGN

…….000……..…….101…….. …….011……..…….100……..

spikes

x yPatterns of Spikes in the LGN

…….000100……..…….101101…….. …….011110……..…….010001……..

spikes

P( spike pattern)

P( spike pattern | stimulus )

There is some extra Information in Temporal Patterns of spikes.

Claude Shannon Ph.D.1916-2001

Prof. Tom CoverEE376A & B

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Recommended