Today: Entropy Information Theory. Claude Shannon Ph.D. 1916-2001

Today:

Entropy

<break>

Information Theory

Claude Shannon Ph.D.1916-2001

I x;y( ) = H x( ) − H x | y( )

H x( ) = − p x i( )log2 p x i( )[ ]∑

Entropy

H x( ) = p x i( )log2 p x i( )[ ]∑

Entropy

A measure of the disorder in a system

EntropyThe (average) number of yes/no

questions needed to completely specify the state of a

system

The (average) number of yes/no questions needed

to completely specify the state of a system

What if there were two coins?

2 states. 1 question.

4 states. 2 questions.

number of states = 2

number of yes-no questions

number of states = 2

log2(number of states) =

H = log2 n[ ]

H is entropy, the number of yes-no questions required to specify the state of the system

n is the number of states of the system, assumed (for now) to be equally likely

H = log2 n[ ]

Consider Dice

The Six Sided Die

H = log2(6) = 2.585 bits

The Four Sided Die

H = log2(4) = 2.000 bits

The Twenty Sided Die

H = log2(20) = 4.322 bits

What about all three dice?

H = log2(4620)

H = log2(4)+log2(6)+log2(20)

H = 8.907 bits

Entropy, from independent elements of a system, adds

H = log2 n[ ]

Let’s the rewrite this a bit...Trivial Fact 1: log2(x) = - log2(1/x)

H = −log2

⎡ ⎣ ⎢

⎤ ⎦ ⎥

Trivial Fact 1: log2(x) = - log2(1/x)

Trivial Fact 2:if there are n equally likely

possibilites p = (1/n)

H = −log2 p[ ]

Trivial Fact 2:if there are n equally likely

possibilites p = (1/n)

H = −log2 p[ ]

What if the n statesare not equally

probable?Maybe we should use the

expected value of the entropies,a weighted average by probability

H = − pi log2 pi[ ]i=1

Let’s do a simple example:n = 2 , how does H change as we

vary p1 and p2 ?

p1 + p2 = 1

how about n = 3€

p1 + p2 + p3 = 1

The bottom line intuitions for Entropy:

• Entropy is a statistic for describing a probability distribution.

• Probabilities distributions which are flat, broad, sparse, etc. have HIGH entropy.

• Probability distributions which are peaked, sharp, narrow, compact etc. have LOW entropy.

• Entropy adds for independent elements of a system, thus entropy grows with the dimensionality of the probability distribution.

• Entropy is zero IFF the system is in a definite state, i.e. p = 1 somewhere and 0 everywhere else.

Pop Quiz:

EntropyThe (average) number of yes/no

questions needed to completely specify the state of a

system

11:16 am (Pacific) on June 29th of the year 2001,

there were approximately 816,119 words in the English Language

H(english) = 19.6 bits

Twenty Questions:220 = 1,048,576

What’s a winning 20 Questions Strategy?

<break>

I x;y( ) = H x( ) − H x | y( )

So, what is information?

It’s a change in what you don’t know.

It’s a change in the entropy.

x yInformation as a measure of correlation

heads tails

I (X;Y) = H(Y) - H(Y|X) = 0 bits

heads tails

P(Y|x=heads )H(Y|x=heads) = 1H(Y) = 1

x yInformation as a measure of correlation

heads tails

I (X;Y) = H(Y) - H(Y|X) ~ 1 bit

heads tails

P(Y|x=heads )H(Y|x=heads) ~ 0H(Y) = 1

x yInformation Theory in Neuroscience

The Critical Observation:

Information is Mutual

I(X;Y) = I(Y;X)

H(Y)-H(Y|X) = H(X)-H(X|Y)

What a spike tells the Brain about the stimulus,

is the same as what our stimulus choice tells us about the likelihood

of a spike.

I(Stimulus;Spike) = I(Spike;Stimulus)

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

What our stimulus choice tells us about the likelihood of a spike.

stimulus response

This, we can measure....

Show your system stimuli.Measure neural

responses.P( neural response | stimulus presented )Estimate: P( neural repsones )From that,

Estimate:Compute: H(neural response) and H(neural response | stimulus presented)

Calculate: I(response ; stimulus)

How to use Information Theory:

Choose stimuli which are not representative.Measure the “wrong” aspect of the response.Don’t take enough data to estimate P( ) well. Use a crappy method of computing H( ).Calculate I( ) and report it without comparing it to anything...

How to screw it up:

Here’s an example of Information Theory applied appropriately

Temporal Coding of Visual Information in the Thalamus Pamela Reinagel and R. Clay Reid

J. Neurosci. 20(14):5392-5400. (2000)

LGN responses are very reliable.

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Is there information in thetemporal pattern of spikes?

x yPatterns of Spikes in the LGN

…….0……..…….1……..

spikes

…….00……..…….10…….. …….01……..…….11……..

spikes

…….000……..…….101…….. …….011……..…….100……..

spikes

…….000100……..…….101101…….. …….011110……..…….010001……..

spikes

P( spike pattern)

P( spike pattern | stimulus )

There is some extra Information in Temporal Patterns of spikes.

Claude Shannon Ph.D.1916-2001

Prof. Tom CoverEE376A & B

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Today: Entropy Information Theory. Claude Shannon Ph.D. 1916-2001

Documents

1916 flyingmentheirma00wincrich

Claude Monet The greatest impressionist. Oscar Claude Monet Oscar Claude Monet

1916 - 2016

1916 rising

Claude Shannon 1916-2001. Claude Shannon Claude Elwood Shannon is the founding father of electronic communications age. Claude’s parents are Claude

Information Theory The Work of Claude Shannon (1916-2001) and others

Claude Walker–Managing Director Claude Walker National and

A NUMERICAL STUDY OF ENTROPY AND RESIDUAL ENTROPY

Big Bang Blow Out #1 Entropy versus Anti entropy Bang... · 2015-04-03 · Big Bang Blow-Out #1 Entropy versus Anti-entropy ... and consequently self-collapsing. The term, entropy,

entropy(accoun,ng - Consortium status update | MIT ... · Thermodynamics(and(Informaon?(• Claude(Shannon(correlated(thermodynamics(and(informaon(theory(in(MTC(• Shannon(demonstrated(entropy(to(be(equivalentto(ashortage

Quantum entropy - Michael Nielsenmichaelnielsen.org/blog/qicss/entropy-web.pdf · Quantum entropy Michael A. Nielsen University of Queensland Goals: 1. To define entropy, both classical

Easter 1916

Entropy of Keys and Password Generation Introduction to entropy Entropy and data compression Predictability of random number generation Entropy and system

mulheres - arquivonacional.gov.brarquivonacional.gov.br/images/Revista_arquivo_em_cartaz_2019_mio… · Shoes (1916), The people vs. John Doe (1916) e Where are my children? (1916)

Information Theory (IT) of Machine Learning for Big Datajunga1/FutureITTalk.pdf · Claude Elwood Shannon (1916 - 2001) 8/21. aalto-logo-en-3 The Communication Problem characterize

Entropy-Preserving and Entropy-Stable Relaxation IMEX and

Induction and recursionnlp.chonbuk.ac.kr/DM/slides/Chapter12.pdf · 2017. 2. 25. · overheads) Claude Shannon (1916 - 2001) Section 12.1. Section Summary

11.12.2017 20:00 Grand Auditorium Lundi / Montag …...Claude Debussy (1862–1918) Sonate pour violon et piano en sol mineur (g-moll) L 140 (1916/17) Allegro vivo Intermède: Fantasque

649849 ENTROPY Design of an Innovative Energy …...649849 ENTROPY D1.2 ENTROPY Technical Requirements 18/05/2017 – v0.1 Page 7 of 41 2. ENTROPY ROLES In ENTROPY, a set of roles

Entropy, Relative Entropy, and Mutual Informationmath.ubbcluj.ro/~tradu/TI/coverch2.pdf · Entropy, Relative Entropy, and Mutual Information Some basic notions of Information Theory