Lecture 1 Introductionhomepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/IT... · 2015. 9. 22. · Exercises: We put some exercises on the slides to help you learn and understand. Occasionally,

Course InformationOverview

Lecture 1Introduction

I-Hsiang Wang

Department of Electrical EngineeringNational Taiwan University

[email protected]

September 22, 2015

1 / 46 I-Hsiang Wang IT Lecture 1

[email protected]


Information Theory

Information Theory is a mathematical theory of information

Information is usually obtained by getting some “messages” (speech,text, images, etc.) from others.When obtaining information from a message, you may care about:

What is the meaning of a message?How important is the message?How much information can I get from the message?



Information Theory

Information Theory is a mathematical theory of information.

Information is usually obtained by getting some “messages” (speech,text, images, etc.) from others.When obtaining information from a message, you may care about:

What is the meaning of a message?How important is the message?How much information can I get from the message?

Information theory is about the quantification of information.



Information Theory

Information Theory is a mathematical theory of information (primarilyfor communication systems) that

Establishes the fundamental limits of communication systems(Quantifies the amount of information that can be delivered from aparty to another)Built upon probability theory and statisticsMain concern: ultimate performance limit (usually the rate ofinformation processing) as certain resources (usually the totalamount of time) scales to the asymptotic regime, given that thedesired information is delivered “reliably”.



In this course, we will1 Establish solid foundations and intuitions of information theory,2 Introduce explicit methods to achieve information theoretic limits,3 Demonstrate further applications of information theory beyond

communications.

Later, we begin with a brief overview of information theoryand the materials to be covered in this course.



1 Course Information

2 Overview



Logistics

1 Instructor: I-Hsiang Wang 王奕翔Email: [email protected]: MD-524 明達館 524 室Office Hours: 17:00 – 18:00, Monday and Tuesday

2 Lecture Time:13:20 – 14:10 (6) Tuesday, and 10:20 – 12:10 (34) Wednesday

3 Lecture Location: EE2-225 電機二館 225 室

4 Course Website:http://homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/IT.html

5 Prerequisites: Probability, Linear Algebra.


[email protected]

http://homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/IT.html


Logistics

6 Grading: Homework (35%), Midterm (30%), Final (35%)

7 ReferencesT. Cover and J. Thomas, Elements of Information Theory, 2ndEdition, Wiley-Interscience, 2006.R. Gallager, Information Theory and Reliable Communications,Wiley, 1968.I. Csiszar and J. Korner, Information Theory: Coding Theorems forDiscrete Memoryless Systems, 2nd Edition, Cambridge UniversityPress, 2011.S. M. Moser, Information Theory (Lecture Notes), 4th edition, ISILab, ETH Zürich, Switzerland, 2014.R. Yeung, Information Theory and Network Coding, Springer, 2008.A. El Gamal and Y.-H. Kim, Network Information Theory, CambridgeUniversity Press, 2011.



Homework

1 Roughly 5 ∼ 6 problems every two weeks, in total 7 times.

2 Homework (HW) is usually released on Monday. Deadline ofsubmission is usually on the next Wednesday in class.

3 Late homework = 0 points. (Let me know in advance if you have difficulties.)

4 Everyone has to develop detailed solution for one HW problem,documented in LATEX and submitted 1 week after the HW due.

We will provide LATEX templates, and you should discuss with theinstructor about the homework problem that you are in charge of,making sure the solution is correct.

5 This additional effort accounts for part of your homework grades.



Reading and Lecture Notes

1 Slides: Slides are usually released/updated every Sunday evening.

2 Readings: Each lecture has assigned readings.Reading is required: it is not enough to learn from the slides!

3 Go through the slides and the assigned readings before our lectures.It helps you learn better.

4 I recommend you get a copy of the textbook by Cover and Thomas.It is a good reference, and we will often assign readings in the book.

5 Other assigned readings could be Moser’s lecture note (can beobtained online) and relevant papers.



Interaction

1 In-class:Language: This class is taught in English. However, to encourageinteraction, feel free to ask questions in Mandarin. I will repeat yourquestion in English (if necessary), and answer it in English.Exercises: We put some exercises on the slides to help you learn andunderstand. Occasionally, I will call for volunteer to solve theexercises in class. Volunteers get bonus.

2 Out-of-class:Office Hours: Both TA and myself have 2-hour office hours perweek. You are more than welcome to come visit us and askquestions, discuss about research, chat, complain, etc.If you cannot make it to the regular office hours, send us emails toschedule a time slot. My schedule can be found in my website.Send us emails with a subject starting with “[NTU Fall15 IT]”.Feedback: There will be online polls during the semester to collectyour feedback anonymously.



Course Outline

Measures of Information: entropy, conditional entropy, relativeentropy (KL divergence), mutual information.Lossless Source Coding: lossless source coding theorem, discretememoryless sources, asymptotic equipartition property, typicalsequences, Fano’s inequality, converse proof, ergodic sources,entropy rate.Noisy Channel Coding: noisy channel coding theorem, discretememoryless channels, random coding, typicality decoder, thresholddecoder, error probability analysis, converse proof, channel withfeedback.Channel Coding over Continuous Valued Channels: channelcoding with cost constraints, discretization technique, differentialentropy, Gaussian channel capacity.



Course Outline

Lossy Source Coding (Rate Distortion Theory): distortion,rate-distortion tradeoff, typicality encoder, converse proof.Source-Channel Separation and Joint Source-Channel Coding

Information Theory and Statistics: method of types, Sanov’stheorem, large deviation, hypothesis testing, estimation, Cramér-Raolower bound, non-parametric estimation.Data Compression: prefix-free code, Kraft’s inequality, Huffmancode, Lempel-Ziv compression.Capacity Achieving Channel Codes: polar codes, LDPC codes.Selected Advanced Topics: network coding, compressed sensing,community detection, non-asymptotic information theory, etc.



Tentative Schedule

Week Date Content Remark

1 09/15, 16 Introduction; Measures of Information

2 09/22, 23 Measures of Information

3 09/29, 30 Lossless Source Coding HW1 out

4 10/06, 07 Lossless Source Coding HW1 due

5 10/13, 14 Noisy Channel Coding HW2 out

6 10/20, 21 Noisy Channel Coding HW2 due

7 10/27, 28 Continuous-Valued Channel Coding HW3 out

8 11/03, 04 Continuous-Valued Channel Coding HW3 due

9 11/10, 11 Midterm Exam



Tentative Schedule

Week Date Content Remark

10 11/17, 18 Lossy Source Coding HW4 out

11 11/24, 25 Joint Source-Channel Coding HW4 due

12 12/01, 02 Information Theory and Statistics HW5 out

13 12/08, 09 Information Theory and Statistics HW5 due

14 12/15, 16 Data Compression HW6 out

15 12/22, 23 Data Compression; Polar Code HW6 due

16 12/29, 30 Polar Code HW7 out

17 01/05, 06 Advanced Topics HW7 due

18 01/12, 13 Final Exam



1 Course Information

2 Overview





Claude E. Shannon(1916 – 2001)



Information theory

is a mathematical theoryof communication





Information theory

is the mathematical theoryof communication



Origin of Information Theory







Shannon’s landmark paper in 1948 is generally considered as the“birth” of information theory.In the paper, Shannon set it clear that information theory is aboutthe quantification of information in a communication system.In particular, it focuses on characterizing the necessary and sufficientcondition of whether or not a destination terminal can reproduce amessage generated by a source terminal.



What is InformationTheory about?



It is about the analysis offundamental limits

1 Stochastic modelingIt is a unified theory based on stochastic modeling (informationsource, noisy channel, etc.).

2 Theorems, not only definitionsIt provides mathematical theorems on optimal performance ofalgorithms (coding schemes), rather than merely definitions.

3 Sharp phase transitionIt draws the boundary between what is possible to achieve and whatis impossible, leading to math-driven system design.



It is about the designdriven by theory

Engineering)Design)Driver• Information theory not only gives fundamental limits, but

also provides guidelines suggesting how to achieve them• Some examples:

10

lossless compression

satellite communication

DSL modems

mobile communication

etc.wireless access

networks

Universal data compression

Error correcting code



10



DSL modems


etc.wireless access

networks

DSL modem

Cellular system



10



DSL modems


etc.wireless access

networks

Wireless access network

Cryptography

…and much more!27 / 46 I-Hsiang Wang IT Lecture 1


Communication System

Encoder Channel DecoderSource Destination

Noise

Above is an abstract model of communication system:1 The source would like to deliver some message to the destination,

where the message includes speech, image, video, audio, text, etc.2 The channel is the physical medium that connects the source and

the destination, such as cable, optical fiber, EM radiation, etc., andis usually subject to certain noise disturbances.

3 The encoder can carry out any processing of the source output,including compression, modulation, insertion of redundancy, etc.

4 The decoder can carry out any processing of the channel output toreproduce the source message.



A primary concern of information theory is on the encoder and thedecoder, both in terms of

How the encoder and the decoder function, andThe existence or nonexistence of encoders and decoders that achievea given level of performance



Prior to the 1948 paper, design of communication systems followed theanalog paradigm – if the source produces a electromagnetic waveform,the destination should try its best to reconstruct this waveform, in orderto extract useful information (usually, voice). This line of research wasbased on Fourier analysis and gave birth to sampling theory.Shannon asked:

If the receiver knows that a sine wave of unknown frequency isto be communicated, why not simply send the frequency ratherthan the entire waveform?

Prior to Shannon, theorist and engineers were able to analyze theperformance of certain choice of encoders/decoders, but had littleknowledge about what is the ultimate limit.Shannon asked:

For all possible encoders/decoders, what is the necessary andsufficient condition for the destination to be able to reconstructthe message sent from the source?



Shannon’s View


Noise

Key new insights due to Shannon’s work:Shannon: “Information is the resolution of uncertainty.” Indeed, theset of possible source outputs, rather than any particular output, isof primary interest.Introduction of an abstract mathematical model of communicationsystem based on random processes (hence, a stochastic model)Creation of the digital paradigm of communication system design –bit – as the universal currency of information, by proposing andproving the source-channel separation theorem



Stochastic Modeling


Noise

The stochastic modeling of communication system comprises:Source: model the information source by random processes, wherethe data to be conveyed is drawn randomly from a given distributionChannel: model the noisy channel by random processes, where theimpact of noise is drawn randomly from a given distribution

Why use random processes to model communication system?Shannon: “The system must be designed to operate for eachpossible selection, not just the one which will actually bechosen since this is unknown at the time of design.”



Source-Channel SeparationSource

EncoderSource

NoisyChannel

ChannelEncoder

Destination SourceDecoder

ChannelDecoder

Binary Interface

Bits

Bits

Shannon showed that by splitting the coders into source coders andchannel coders, the fundamental limit of the system remains the same.In other words, introducing a digital (binary) interface does not incur anyloss of optimality, in terms of whether or not the destination canreproduce the source data.Separation of source coding and channel coding simplifies engineeringdesign – source coder design (data compression) and channel coderdesign (data transmission).



I have been always wondering how on earth Shannon came up with thebrilliant idea of separating source coding and channel coding.

A very likely answer:“Shannon is simply a genius.”

A more down-to-earth one:“Shannon saw the essence of research: seek for simplification first.”



Original Block Diagram


Noise

Simplification: Remove the Channel Noise!


Noise

This step makes life much easier. Yet, it is still a non-trivial problem.



Source Coding (Data Compression)

NoisyChannel

ChannelEncoder

ChannelDecoder

SourceEncoderSource


Features of source messages:Uncertainty: the destination has no idea what message is chosen bythe source a priori.Redundancy: though randomly chosen, some choices are more likely,while others are less likely.

Goal: Remove redundancy of the source message and represent it by abit sequence, so that it can be delivered to the destination reliably.




NoisyChannel

ChannelEncoder

ChannelDecoder

SourceEncoderSource


s[1], . . . , s[N ]

b[1], . . . , b[K]

bs[1], . . . , bs[N ]

Notations:{s[1], . . . , s[N]} represent the source message; each s[t] is called a“source symbol”.{b[1], . . . , b[K]} represent the codeword, generated by the sourceencoder; each bit b[t] is called a “source codeword symbol (bit)”.{s[1], . . . , s[N]} represent the reproduced source message at thedestination.




NoisyChannel

ChannelEncoder

ChannelDecoder

SourceEncoderSource


s[1], . . . , s[N ]

b[1], . . . , b[K]

bs[1], . . . , bs[N ]

Question:For a given N (# of source symbols), what is the minimum K (# of bits) torecover s[1], . . . , s[N] at the decoder?It is not hard to show that the smallest K = Θ(N). (check!)

The right (non-trivial) question to ask is:What is the minimum value of K

N ?




NoisyChannel

ChannelEncoder

ChannelDecoder

SourceEncoderSource


s[1], . . . , s[N ]

b[1], . . . , b[K]

bs[1], . . . , bs[N ]

Shannon answered the above question and characterized the necessaryand sufficient condition for (lossless) source coding:

A Source Coding TheoremThe destination can reconstruct the source message losslessly⇐⇒ code rate R := K

N > the entropy rate of the source, H(S)

We will define entropy in Lecture 2; it is a quantity that can be computedfrom the distribution of the source random process {S[t] | t ∈ N}.



Original Block Diagram


Noise

Simplification’: Remove the Source Redundancy!


Noisei.i.d. Bernoulli(1/2)i.e., Random Bits

It remains a highly non-trivial problem.



Channel Coding (Data Transmission)SourceEncoderSource

NoisyChannel

ChannelEncoder


ChannelDecoder

Features of noisy channel:Noise: channel input sent by the channel encoder is corrupted by thenoise randomly, and produce the channel output.Uniform messages: input of channel encoders are assumed WLOG tobe bit sequences with no redundancy, since source coding alreadyremoves all redundancy and convert to bit sequences.

Goal: add minimum redundancy so that messages (bit sequences) can becommunicated over the noisy channel and decoded reliably.




NoisyChannel

ChannelEncoder


ChannelDecoder

b[1], . . . , b[K]

bb[1], . . . ,bb[K]

x[1], . . . , x[N ]

y[1], . . . , y[N ]

p (y|x)

Notations:{x[1], . . . , x[N]} represent the codeword; each x[t] is called a “codedsymbol”.{b[1], . . . , b[K]} represent the message; each bit b[t] is called a “datasymbol (bit)”.{y[1], . . . , y[N]} represent the channel output.




NoisyChannel

ChannelEncoder


ChannelDecoder

b[1], . . . , b[K]

bb[1], . . . ,bb[K]

x[1], . . . , x[N ]

y[1], . . . , y[N ]

p (y|x)

Question:For a given K (# of input bits), what is the minimum N (# of coded symbols)to recover b[1], . . . , b[K] at the decoder?It turns out that N = Θ(K). However, proving this is already non-trivial.

Shannon further ask: What is the maximum value of KN ?




NoisyChannel

ChannelEncoder


ChannelDecoder

b[1], . . . , b[K]

bb[1], . . . ,bb[K]

x[1], . . . , x[N ]

y[1], . . . , y[N ]

p (y|x)

Shannon gave the necessary and sufficient condition for channel coding:

A Channel Coding TheoremThe (channel) decoder can decode the message reliably⇐⇒ code rate R := K

N < the channel capacity of the channel, C

We will define channel capacity later; it is a quantity that can becomputed by maximizing the “mutual information” between X and Y,which can be computed from the conditional distribution of the channel.



Summary



Information theory focuses on the quantitative aspects ofinformation, not the qualitative aspectsInformation theory is mainly about what is possible and what isimpossible in communication systemsIn information theory, one investigates problems in communicationsystems through the lens of probability theory and statisticsIn the course, we mainly focus on discrete-time signals, not oncontinuous-time signalsSource-channel separation forms the basis of digital communication,where binary digits (bits) become the universal currency ofinformation


Documents

Lecture 1 Introductionhomepage.ntu.edu.tw/~ihwang/Teaching/Fa15/Slides/IT... · 2015. 9. 22. · Exercises: We put some exercises on the slides to help you learn and understand. Occasionally,