35
Overview of Information Theory Including a Brief Introduction Claude Shannon (The “Father” of Information Theory) 1

Overview of Information Theory Including a Brief Introduction Claude Shannon (The “Father” of Information Theory) 1

Embed Size (px)

Citation preview

Page 1: Overview of Information Theory Including a Brief Introduction Claude Shannon (The “Father” of Information Theory) 1

Overview of Information TheoryIncluding a Brief Introduction Claude Shannon

(The “Father” of Information Theory)

1

Page 2: Overview of Information Theory Including a Brief Introduction Claude Shannon (The “Father” of Information Theory) 1

What is Information Theory?

• Basic concepts of information theory • Elements of sets theory and probability theory • Measure of information and uncertainty. Entropy• Basic concepts of communication channels

organization• Basic principles of encoding • Error-detecting and error-correcting codes

2

Page 3: Overview of Information Theory Including a Brief Introduction Claude Shannon (The “Father” of Information Theory) 1

Information

• What does the word “information” mean?• There is no exact definition (!), however:

Information carries new specific knowledge, which is definitely new for its recipient;

Information is always carried by some specific carrier in different forms (letters, digits, different specific symbols, sequences of digits, letters, and symbols , etc.);

Information is meaningful only if the recipient is able to interpret it.

3

Page 4: Overview of Information Theory Including a Brief Introduction Claude Shannon (The “Father” of Information Theory) 1

Information• According to the Oxford English Dictionary, the earliest historical meaning

of the word information in English was the act of informing, or giving form or shape to the mind.

• The English word was apparently derived by adding the common “noun of action” ending “-ation”.

• The information materialized is a message.• Information is always about something (size of a parameter,

occurrence of an event, etc). • Viewed in this manner, information does not have to be accurate; it may

be a truth or a lie. Even a disruptive noise used to inhibit the flow of communication and create misunderstanding would in this view be a form of information.

• However, generally speaking, if the amount of information in the received message increases, the message is more accurate.

4

Page 5: Overview of Information Theory Including a Brief Introduction Claude Shannon (The “Father” of Information Theory) 1

Questions:

• How we can measure the amount of information?

• How we can ensure the correctness of information?

• What to do if information gets corrupted by errors?

• How much memory does it require to store information?

5

Page 6: Overview of Information Theory Including a Brief Introduction Claude Shannon (The “Father” of Information Theory) 1

• The basic answers to these questions that formed a solid background of modern

Information Theorywere given by the great American mathematician, electrical engineer, and computer scientist

Claude E. Shannonin his paper

“A Mathematical Theory of Communication”published in “The Bell

System Technical Journal”

October, 1948.

6

Page 7: Overview of Information Theory Including a Brief Introduction Claude Shannon (The “Father” of Information Theory) 1

Claude Elwood Shannon (1916-2001)

The “Father” of Information Theory Shannon is ALSO

The “Father” of Practical Digital Circuit Design Theory(MS Thesis at MIT in 1938)

Career: Bell Laboratories (1941-1972), MIT(1956-2001)

7

Page 8: Overview of Information Theory Including a Brief Introduction Claude Shannon (The “Father” of Information Theory) 1

Brief Comments on Shannon’s MS ThesisNote: His MS Thesis (MIT, 1938) was on a topic that is separate & distinct from Information

Theory

“A Symbolic Analysis of Relay and Switching Circuits”.Published as a journal article in

“Transactions of the American Institute of Electrical Engineers” Vol.  57 (1938). pp. 713-23.

“Claude E. Shannon, founder of what is often called Information Theory, in his master's thesis, showed in a masterful way how the analysis of complicated circuits for switching could be affected by the use of Boolean algebra.”

“This surely must be one of the most important Master's Theses ever written!!.. It was a landmark in that it helped to change digital circuit design from an art to a

science.”From a review of his thesis by science historian Hermann Goldstine.

• An almost new condition, bound copy of his thesis

is for sale on the web for $18,000!!!

8

Page 9: Overview of Information Theory Including a Brief Introduction Claude Shannon (The “Father” of Information Theory) 1

More CommentsClaude Shannon, MS Thesis, MIT, 1938

“A Symbolic Analysis of Relay and Switching Circuits”.• “In his work, Shannon showed how Boolean algebra could be used in the

analysis and synthesis of switching and computer circuits. The thesis aroused considerable interest when it appeared in 1938 in the A.I.E.E. Transactions.

• “In 1940 it was awarded

The Alfred Noble Prize*of the Combined Engineering Societies of the United States”

• This award given each year to a person under 35 years old for a paper published in one of the journals of the participating societies.

* Not affiliated in any way with the Nobel Prize!

9

Page 10: Overview of Information Theory Including a Brief Introduction Claude Shannon (The “Father” of Information Theory) 1

• Shannon defined the basics of how modern computers work, before there were any modern computers!!

• As a boy, he designed & built a working model plane & a telegraph system that connected his bedroom to a friend's bedroom ½ mile away. In adolescence he ran a fix-it shop in a back room at a local drug store.

• His MS advisor was Norbert Wiener. He was fascinated by Boole's Laws of Thought. His MS thesis showed how Boolean algebra could be applied in computer circuitry by organizing data as a series of simple yes/no switches.

• The ideas in his MS Thesis are now sometimes called

The “Magna Carta” of The Information Age.In it, he calculated the maximum volume, or “channel capacity” in binary digits (or “bits”, a term coined by Shannon) per second, of communication transmission over media: first over telephone lines, later in optical communications, & later in wireless communications.

10

Page 11: Overview of Information Theory Including a Brief Introduction Claude Shannon (The “Father” of Information Theory) 1

• Besides founding Information Theory, in 1949, Shannon wrote a major paper on Cryptology, changing the way secure messages are coded.

“Communication Theory of Secrecy Systems” • This was based on the information theory that messages could be made undecipherable if

sufficient “redundancy” (extra bits) were added. In addition to being the Founder of Information Theory, he is also known as the

Founder of Cryptology. • Only SOME of his other Major Achievements:

1. Defined (information) entropy as a measure of efficiency in a communications system.

2. First to program a computer to play chess.

3. Designed & built the world’s first wearable computer.

4. Wrote several important papers on communication theory, cryptography, & information theory.

5. Worked on the differential analyzer, a mechanical computer.

11

Page 12: Overview of Information Theory Including a Brief Introduction Claude Shannon (The “Father” of Information Theory) 1

More of Shannon’s Discoveries, Breakthroughs, Inventions• A Communication System Employing Pulse Code Modulation

(used to help break German & Japanese codes in WWII)

• An Algebra for Theoretical Genetics• A Theorem on Coloring the Lines of a Network.

• A Mathematical Theory of the Differential Analyzer.

• Sampling Theory• Theory of Linear Differential & Smoothing Operators.

• Theory & Design of Linear Differential Equation Machines

• Invented Electronic Methods in Telephone Switching.

• Lattice Theory of Information• Invented a Method of Signal Transmission to a Moving Vehicle.

• The Portfolio Problem solution + Many, Many more!!

A few hundred published papers, several dozen patents!!

12

Page 13: Overview of Information Theory Including a Brief Introduction Claude Shannon (The “Father” of Information Theory) 1

• For his own amusement, Shannon invented & built many original toys:

Just a few: A motorized pogo stick A calculator that takes input & gives answers only in Roman numerals.

A “Roulette Bettors' Helper”: A special device to predict winning numbers on the roulette wheel. This was successful, but information about how much money was won with the help of that device is unknown

Numerous fully functional three-ball juggling machines. Clowns juggling Worked out the math of juggling & proved a theorem that is now known as “Shannon’s

Juggling Theorem”.

A mechanical mouse that could navigate a maze. A working “ultimate machine”: A box with a switch that, when switched on, powered

a mechanical hand that emerged from the box to switch the switch back off, then withdrew into the box as the mechanism powered down.

13

Page 14: Overview of Information Theory Including a Brief Introduction Claude Shannon (The “Father” of Information Theory) 1

As you can tell from his toy inventions,

Claude Shannon was typical of the "eccentric genius" stereotype.

• He not only built machines to juggle, but he was quite good at juggling himself!

• For most of his career at MIT, he traveled at the university, indoors & outdoors, on his unicycle! – His first unicycle, but certainly not his last, was a gift from his wife Betty,

who knew he loved gadgets. – Within a few days he was riding around the block, and in a few weeks was

doing so while juggling three balls! – To make it more interesting, he later assembled a unicycle with an eccentric

wheel, so the rider would move up & down as he pedaled forward.

14

Page 15: Overview of Information Theory Including a Brief Introduction Claude Shannon (The “Father” of Information Theory) 1

Back to Information TheoryInformation Content

• What is the information content of any message?• Shannon’s answer is:

The information content of a message consists simply of the number of 1’s

and 0’s it takes to transmit it.

15

Page 16: Overview of Information Theory Including a Brief Introduction Claude Shannon (The “Father” of Information Theory) 1

• The elementary unit of information is a binary unit: a bit, which can be either 1 or 0; “true” or “false”; “yes” or “know”, “black” and “white”, etc.

• A basic postulate of information theory is that information can be treated like a measurable physical quantity, such as density or mass.

• Suppose you flip a coin one million times and write down the sequence of results. If you want to communicate this sequence to another person, how many bits will it take?

• If it's a fair coin, the two possible outcomes, heads and tails, occur with equal probability. Therefore each flip requires 1 bit of information to transmit. To send the entire sequence will require one million bits.

16

Page 17: Overview of Information Theory Including a Brief Introduction Claude Shannon (The “Father” of Information Theory) 1

• Suppose the coin is biased so that heads occur only 1/4 of the time, and tails occur 3/4. Then the entire sequence can be sent in 811,300 bits, on average This would seem to imply that each flip of the coin requires just 0.8113 bits to transmit.

• How can you transmit a coin flip in less than one bit, when the only language available is that of zeros and ones?

• Obviously, you can't. But if the goal is to transmit an entire sequence of flips, and the distribution is biased in some way, then you can use your knowledge of the distribution to select a more efficient code.

• Another way to look at it is: a sequence of biased coin flips contains less "information" than a sequence of unbiased flips, so it should take fewer bits to transmit.

• Information Theory regards information as only those symbols that are uncertain to the receiver.

• For years, people have sent telegraph messages, leaving out non-essential words such as "a" and "the."

• In the same vein, predictable symbols can be left out, like in the sentence, "only infrmatn esentil to understandn mst b tranmitd”. Shannon made clear that uncertainty is the very commodity of communication.

17

Page 18: Overview of Information Theory Including a Brief Introduction Claude Shannon (The “Father” of Information Theory) 1

• Suppose we transmit a long sequence of one million bits corresponding to the first example. What should we do if some errors occur during this transmission?

• If the length of the sequence to be transmitted or stored is even larger that 1 million bits, then 1 billion bits… what should we do?

18

Page 19: Overview of Information Theory Including a Brief Introduction Claude Shannon (The “Father” of Information Theory) 1

Two main questions ofInformation Theory

• What to do if information gets corrupted by errors?

• How much memory does it require to store data?

• Both questions were asked and to a large degree answered by Claude Shannon in his 1948 article:

• Use error correction and data compression

19

Page 20: Overview of Information Theory Including a Brief Introduction Claude Shannon (The “Father” of Information Theory) 1

Shannon’s Basic Principles of Information Theory

• Shannon’s theory told engineers how much information could be transmitted over the channels of an ideal system.

• He also spelled out mathematically the principles of data compression, which recognize what the end of this sentence demonstrates, that “only infrmatn esentil to understadn mst b tranmitd”.

• He also showed how we could transmit information over noisy channels at error rates we could control.

20

Page 21: Overview of Information Theory Including a Brief Introduction Claude Shannon (The “Father” of Information Theory) 1

21

Why is Information Theory Important?

• Thanks in large measure to Shannon's insights, digital systems have come to dominate the world of communications and information processing.– Modems– satellite communications– Data storage– Deep space communications – Wireless technology

Page 22: Overview of Information Theory Including a Brief Introduction Claude Shannon (The “Father” of Information Theory) 1

Channels• A channel is used to get information across:

Source

binary channel

0,1,1,0,0,1,1 Receiver

Many systems act like channels.Some obvious ones: phone lines, Ethernet cables.Less obvious ones: the air when speaking, TV screen when watching, paper when writing an article, etc.

These are physical devices and hence prone to errors.

22

Page 23: Overview of Information Theory Including a Brief Introduction Claude Shannon (The “Father” of Information Theory) 1

Noisy Channels

A noiseless binary channel transmits bits without error:

0 0

1 1

A noisy, symmetric binary channel applies a bit-flip 0 or 1 with probability p:

0 0

1 1

1–p

1–p

p

p

What to do if we have a noisy channel and you want to send information across reliably?

23

Page 24: Overview of Information Theory Including a Brief Introduction Claude Shannon (The “Father” of Information Theory) 1

Error Correction pre-Shannon• Primitive error correction (assume p < ½ ):

Instead of sending “0” and “1”, send “0…0” and “1…1”.

• The receiver takes the majority of the bit valuesas the ‘intended’ value of the sender.

• Example: If we repeat the bit value three times,the error goes down from p to p2(3–2p).Hence for p = 0.1 we reduce the error to 0.028.

• However, now we have to send 3 bits to get onebit of information across and this will get worse ifwe want to reduce the error rate further…

24

Page 25: Overview of Information Theory Including a Brief Introduction Claude Shannon (The “Father” of Information Theory) 1

Channel Rate• When correcting errors, we have to be mindful of

the rate of the bits that you use to encode one bit(in the previous example we had rate 1/3).

• If we want to send data with arbitrarily small errors,then this requires arbitrarily low rates r, which is costly.

25

Page 26: Overview of Information Theory Including a Brief Introduction Claude Shannon (The “Father” of Information Theory) 1

Error Correction by Shannon• Shannon’s basic observations:

• Correcting single bits is very wasteful and inefficient;• Instead we should correct blocks of bits.• Shannon showed that by doing so we can get arbitrarily

small errors for the constant channel rate 1–H(p) where H(p) is the

Shannon Information Entropydefined by H(p) = –p log2(p) – (1–p) log2(1–p).

26

Page 27: Overview of Information Theory Including a Brief Introduction Claude Shannon (The “Father” of Information Theory) 1

Model for a Communication System

• The communications systems are of a statistical nature.• That is, the performance of the system can never be

described in a deterministic sense; it is always given in statistical terms.

• A source is a device that selects and transmits sequences of symbols from a given alphabet.

• Each selection is made at random, although this selection may be based on some statistical rule.

27

Page 28: Overview of Information Theory Including a Brief Introduction Claude Shannon (The “Father” of Information Theory) 1

• The channel transmits the incoming symbols to the receiver. The performance of the channel is also based on laws of chance.

• If the source transmits a symbol A, with a probability of P{A} and the channel lets through the letter A with a probability denoted by P{A|A}, then the probability of transmitting A and receiving A is P{A}∙P{A|A}

28

Page 29: Overview of Information Theory Including a Brief Introduction Claude Shannon (The “Father” of Information Theory) 1

Model for a Communication System• The channel is generally lossy: a part of the

transmitted content does not reach its destination or it reaches the destination in a distorted form.

29

Page 30: Overview of Information Theory Including a Brief Introduction Claude Shannon (The “Father” of Information Theory) 1

• A very important task is the minimization of the loss and the optimum recovery of the original content when it is corrupted by the effect of noise.

• A method that is used to improve the efficiency of the channel is called encoding.

• An encoded message is less sensitive to noise.

30

Page 31: Overview of Information Theory Including a Brief Introduction Claude Shannon (The “Father” of Information Theory) 1

• Decoding is employed to transform the encoded messages into the original form, which is acceptable to the receiver.

Encoding: F: I F(I)Decoding: F-1: F(I) I

31

Page 32: Overview of Information Theory Including a Brief Introduction Claude Shannon (The “Father” of Information Theory) 1

A Quantitative Measure of Information

• Suppose we have to select some equipment from a catalog which indicates n distinct models:

• The desired amount of information associated with the selection of a particular model must be a function of the probability of choosing :

1 2, ,..., nx x x

( )kI x

kx

kx

( )k kI x f P x

32

Page 33: Overview of Information Theory Including a Brief Introduction Claude Shannon (The “Father” of Information Theory) 1

• If, for simplicity, we assume that each one of these models is selected with an equal probability, then the desired amount of information is only a function of n:

• If each piece of equipment can be ordered in one of m different colors and the selection of colors is also equiprobable, then the amount of information associated with the selection of a color is :

1 1/kI x f n

33

jc

2 1/j jI c f P c f m

Page 34: Overview of Information Theory Including a Brief Introduction Claude Shannon (The “Father” of Information Theory) 1

• The selection may be done in two ways: • Select the equipment and then the color

independently of each other

• Select the equipment and its color at the same time as one selection from mn possible choices:

• Since these amounts of information are identical, we obtain:

& 1/k jI x c f mn

1 1& ( ) ( ) 1/ 1/k j k jI x c I x I c f n f m

34

1/ 1/ 1/f n f m f mn

Page 35: Overview of Information Theory Including a Brief Introduction Claude Shannon (The “Father” of Information Theory) 1

• Among several solutions of this functional equation, the most important for us is:

• Thus, when a statistical experiment has n eqiuprobable outcomes, the average amount of information associated with an outcome is log n

• The logarithmic information measure has the desirable property of additivity for independent statistical experiments.

• The simplest case to consider is a selection between two eqiuprobable events. The amount of information associated with the selection of one out of two provides a unit of information known as a bit.

logf x x

35

2 2log 1/ 2 log 2 1