Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Course InformationOverview
Lecture 1Introduction
I-Hsiang Wang
Department of Electrical EngineeringNational Taiwan University
September 22, 2015
1 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
Information Theory
Information Theory is a mathematical theory of information
Information is usually obtained by getting some “messages” (speech,text, images, etc.) from others.When obtaining information from a message, you may care about:
What is the meaning of a message?How important is the message?How much information can I get from the message?
2 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
Information Theory
Information Theory is a mathematical theory of information.
Information is usually obtained by getting some “messages” (speech,text, images, etc.) from others.When obtaining information from a message, you may care about:
What is the meaning of a message?How important is the message?How much information can I get from the message?
Information theory is about the quantification of information.
3 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
Information Theory
Information Theory is a mathematical theory of information (primarilyfor communication systems) that
Establishes the fundamental limits of communication systems(Quantifies the amount of information that can be delivered from aparty to another)Built upon probability theory and statisticsMain concern: ultimate performance limit (usually the rate ofinformation processing) as certain resources (usually the totalamount of time) scales to the asymptotic regime, given that thedesired information is delivered “reliably”.
4 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
In this course, we will1 Establish solid foundations and intuitions of information theory,2 Introduce explicit methods to achieve information theoretic limits,3 Demonstrate further applications of information theory beyond
communications.
Later, we begin with a brief overview of information theoryand the materials to be covered in this course.
5 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
1 Course Information
2 Overview
6 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
Logistics
1 Instructor: I-Hsiang Wang 王奕翔Email: [email protected]: MD-524 明達館 524 室Office Hours: 17:00 – 18:00, Monday and Tuesday
2 Lecture Time:13:20 – 14:10 (6) Tuesday, and 10:20 – 12:10 (34) Wednesday
3 Lecture Location: EE2-225 電機二館 225 室
4 Course Website:http://homepage.ntu.edu.tw/~ihwang/Teaching/Fa15/IT.html
5 Prerequisites: Probability, Linear Algebra.
7 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
Logistics
6 Grading: Homework (35%), Midterm (30%), Final (35%)
7 ReferencesT. Cover and J. Thomas, Elements of Information Theory, 2ndEdition, Wiley-Interscience, 2006.R. Gallager, Information Theory and Reliable Communications,Wiley, 1968.I. Csiszar and J. Korner, Information Theory: Coding Theorems forDiscrete Memoryless Systems, 2nd Edition, Cambridge UniversityPress, 2011.S. M. Moser, Information Theory (Lecture Notes), 4th edition, ISILab, ETH Zürich, Switzerland, 2014.R. Yeung, Information Theory and Network Coding, Springer, 2008.A. El Gamal and Y.-H. Kim, Network Information Theory, CambridgeUniversity Press, 2011.
8 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
Homework
1 Roughly 5 ∼ 6 problems every two weeks, in total 7 times.
2 Homework (HW) is usually released on Monday. Deadline ofsubmission is usually on the next Wednesday in class.
3 Late homework = 0 points. (Let me know in advance if you have difficulties.)
4 Everyone has to develop detailed solution for one HW problem,documented in LATEX and submitted 1 week after the HW due.
We will provide LATEX templates, and you should discuss with theinstructor about the homework problem that you are in charge of,making sure the solution is correct.
5 This additional effort accounts for part of your homework grades.
9 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
Reading and Lecture Notes
1 Slides: Slides are usually released/updated every Sunday evening.
2 Readings: Each lecture has assigned readings.Reading is required: it is not enough to learn from the slides!
3 Go through the slides and the assigned readings before our lectures.It helps you learn better.
4 I recommend you get a copy of the textbook by Cover and Thomas.It is a good reference, and we will often assign readings in the book.
5 Other assigned readings could be Moser’s lecture note (can beobtained online) and relevant papers.
10 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
Interaction
1 In-class:Language: This class is taught in English. However, to encourageinteraction, feel free to ask questions in Mandarin. I will repeat yourquestion in English (if necessary), and answer it in English.Exercises: We put some exercises on the slides to help you learn andunderstand. Occasionally, I will call for volunteer to solve theexercises in class. Volunteers get bonus.
2 Out-of-class:Office Hours: Both TA and myself have 2-hour office hours perweek. You are more than welcome to come visit us and askquestions, discuss about research, chat, complain, etc.If you cannot make it to the regular office hours, send us emails toschedule a time slot. My schedule can be found in my website.Send us emails with a subject starting with “[NTU Fall15 IT]”.Feedback: There will be online polls during the semester to collectyour feedback anonymously.
11 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
Course Outline
Measures of Information: entropy, conditional entropy, relativeentropy (KL divergence), mutual information.Lossless Source Coding: lossless source coding theorem, discretememoryless sources, asymptotic equipartition property, typicalsequences, Fano’s inequality, converse proof, ergodic sources,entropy rate.Noisy Channel Coding: noisy channel coding theorem, discretememoryless channels, random coding, typicality decoder, thresholddecoder, error probability analysis, converse proof, channel withfeedback.Channel Coding over Continuous Valued Channels: channelcoding with cost constraints, discretization technique, differentialentropy, Gaussian channel capacity.
12 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
Course Outline
Lossy Source Coding (Rate Distortion Theory): distortion,rate-distortion tradeoff, typicality encoder, converse proof.Source-Channel Separation and Joint Source-Channel Coding
Information Theory and Statistics: method of types, Sanov’stheorem, large deviation, hypothesis testing, estimation, Cramér-Raolower bound, non-parametric estimation.Data Compression: prefix-free code, Kraft’s inequality, Huffmancode, Lempel-Ziv compression.Capacity Achieving Channel Codes: polar codes, LDPC codes.Selected Advanced Topics: network coding, compressed sensing,community detection, non-asymptotic information theory, etc.
13 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
Tentative Schedule
Week Date Content Remark
1 09/15, 16 Introduction; Measures of Information
2 09/22, 23 Measures of Information
3 09/29, 30 Lossless Source Coding HW1 out
4 10/06, 07 Lossless Source Coding HW1 due
5 10/13, 14 Noisy Channel Coding HW2 out
6 10/20, 21 Noisy Channel Coding HW2 due
7 10/27, 28 Continuous-Valued Channel Coding HW3 out
8 11/03, 04 Continuous-Valued Channel Coding HW3 due
9 11/10, 11 Midterm Exam
14 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
Tentative Schedule
Week Date Content Remark
10 11/17, 18 Lossy Source Coding HW4 out
11 11/24, 25 Joint Source-Channel Coding HW4 due
12 12/01, 02 Information Theory and Statistics HW5 out
13 12/08, 09 Information Theory and Statistics HW5 due
14 12/15, 16 Data Compression HW6 out
15 12/22, 23 Data Compression; Polar Code HW6 due
16 12/29, 30 Polar Code HW7 out
17 01/05, 06 Advanced Topics HW7 due
18 01/12, 13 Final Exam
15 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
1 Course Information
2 Overview
16 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
17 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
Claude E. Shannon(1916 – 2001)
18 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
Information theory
is a mathematical theoryof communication
19 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
20 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
Information theory
is the mathematical theoryof communication
21 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
Origin of Information Theory
22 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
Origin of Information Theory
23 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
Origin of Information Theory
Shannon’s landmark paper in 1948 is generally considered as the“birth” of information theory.In the paper, Shannon set it clear that information theory is aboutthe quantification of information in a communication system.In particular, it focuses on characterizing the necessary and sufficientcondition of whether or not a destination terminal can reproduce amessage generated by a source terminal.
24 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
What is InformationTheory about?
25 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
It is about the analysis offundamental limits
1 Stochastic modelingIt is a unified theory based on stochastic modeling (informationsource, noisy channel, etc.).
2 Theorems, not only definitionsIt provides mathematical theorems on optimal performance ofalgorithms (coding schemes), rather than merely definitions.
3 Sharp phase transitionIt draws the boundary between what is possible to achieve and whatis impossible, leading to math-driven system design.
26 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
It is about the designdriven by theory
Engineering)Design)Driver• Information theory not only gives fundamental limits, but
also provides guidelines suggesting how to achieve them• Some examples:
10
lossless compression
satellite communication
DSL modems
mobile communication
etc.wireless access
networks
Universal data compression
Error correcting code
Engineering)Design)Driver• Information theory not only gives fundamental limits, but
also provides guidelines suggesting how to achieve them• Some examples:
10
lossless compression
satellite communication
DSL modems
mobile communication
etc.wireless access
networks
DSL modem
Cellular system
Engineering)Design)Driver• Information theory not only gives fundamental limits, but
also provides guidelines suggesting how to achieve them• Some examples:
10
lossless compression
satellite communication
DSL modems
mobile communication
etc.wireless access
networks
Wireless access network
Cryptography
…and much more!27 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
Communication System
Encoder Channel DecoderSource Destination
Noise
Above is an abstract model of communication system:1 The source would like to deliver some message to the destination,
where the message includes speech, image, video, audio, text, etc.2 The channel is the physical medium that connects the source and
the destination, such as cable, optical fiber, EM radiation, etc., andis usually subject to certain noise disturbances.
3 The encoder can carry out any processing of the source output,including compression, modulation, insertion of redundancy, etc.
4 The decoder can carry out any processing of the channel output toreproduce the source message.
28 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
A primary concern of information theory is on the encoder and thedecoder, both in terms of
How the encoder and the decoder function, andThe existence or nonexistence of encoders and decoders that achievea given level of performance
29 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
Prior to the 1948 paper, design of communication systems followed theanalog paradigm – if the source produces a electromagnetic waveform,the destination should try its best to reconstruct this waveform, in orderto extract useful information (usually, voice). This line of research wasbased on Fourier analysis and gave birth to sampling theory.Shannon asked:
If the receiver knows that a sine wave of unknown frequency isto be communicated, why not simply send the frequency ratherthan the entire waveform?
Prior to Shannon, theorist and engineers were able to analyze theperformance of certain choice of encoders/decoders, but had littleknowledge about what is the ultimate limit.Shannon asked:
For all possible encoders/decoders, what is the necessary andsufficient condition for the destination to be able to reconstructthe message sent from the source?
30 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
Shannon’s View
Encoder Channel DecoderSource Destination
Noise
Key new insights due to Shannon’s work:Shannon: “Information is the resolution of uncertainty.” Indeed, theset of possible source outputs, rather than any particular output, isof primary interest.Introduction of an abstract mathematical model of communicationsystem based on random processes (hence, a stochastic model)Creation of the digital paradigm of communication system design –bit – as the universal currency of information, by proposing andproving the source-channel separation theorem
31 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
Stochastic Modeling
Encoder Channel DecoderSource Destination
Noise
The stochastic modeling of communication system comprises:Source: model the information source by random processes, wherethe data to be conveyed is drawn randomly from a given distributionChannel: model the noisy channel by random processes, where theimpact of noise is drawn randomly from a given distribution
Why use random processes to model communication system?Shannon: “The system must be designed to operate for eachpossible selection, not just the one which will actually bechosen since this is unknown at the time of design.”
32 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
Source-Channel SeparationSource
EncoderSource
NoisyChannel
ChannelEncoder
Destination SourceDecoder
ChannelDecoder
Binary Interface
Bits
Bits
Shannon showed that by splitting the coders into source coders andchannel coders, the fundamental limit of the system remains the same.In other words, introducing a digital (binary) interface does not incur anyloss of optimality, in terms of whether or not the destination canreproduce the source data.Separation of source coding and channel coding simplifies engineeringdesign – source coder design (data compression) and channel coderdesign (data transmission).
33 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
I have been always wondering how on earth Shannon came up with thebrilliant idea of separating source coding and channel coding.
A very likely answer:“Shannon is simply a genius.”
A more down-to-earth one:“Shannon saw the essence of research: seek for simplification first.”
34 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
Original Block Diagram
Encoder Channel DecoderSource Destination
Noise
Simplification: Remove the Channel Noise!
Encoder Channel DecoderSource Destination
Noise
This step makes life much easier. Yet, it is still a non-trivial problem.
35 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
Source Coding (Data Compression)
NoisyChannel
ChannelEncoder
ChannelDecoder
SourceEncoderSource
Destination SourceDecoder
Features of source messages:Uncertainty: the destination has no idea what message is chosen bythe source a priori.Redundancy: though randomly chosen, some choices are more likely,while others are less likely.
Goal: Remove redundancy of the source message and represent it by abit sequence, so that it can be delivered to the destination reliably.
36 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
Source Coding (Data Compression)
NoisyChannel
ChannelEncoder
ChannelDecoder
SourceEncoderSource
Destination SourceDecoder
s[1], . . . , s[N ]
b[1], . . . , b[K]
bs[1], . . . , bs[N ]
Notations:{s[1], . . . , s[N]} represent the source message; each s[t] is called a“source symbol”.{b[1], . . . , b[K]} represent the codeword, generated by the sourceencoder; each bit b[t] is called a “source codeword symbol (bit)”.{s[1], . . . , s[N]} represent the reproduced source message at thedestination.
37 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
Source Coding (Data Compression)
NoisyChannel
ChannelEncoder
ChannelDecoder
SourceEncoderSource
Destination SourceDecoder
s[1], . . . , s[N ]
b[1], . . . , b[K]
bs[1], . . . , bs[N ]
Question:For a given N (# of source symbols), what is the minimum K (# of bits) torecover s[1], . . . , s[N] at the decoder?It is not hard to show that the smallest K = Θ(N). (check!)
The right (non-trivial) question to ask is:What is the minimum value of K
N ?
38 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
Source Coding (Data Compression)
NoisyChannel
ChannelEncoder
ChannelDecoder
SourceEncoderSource
Destination SourceDecoder
s[1], . . . , s[N ]
b[1], . . . , b[K]
bs[1], . . . , bs[N ]
Shannon answered the above question and characterized the necessaryand sufficient condition for (lossless) source coding:
A Source Coding TheoremThe destination can reconstruct the source message losslessly⇐⇒ code rate R := K
N > the entropy rate of the source, H(S)
We will define entropy in Lecture 2; it is a quantity that can be computedfrom the distribution of the source random process {S[t] | t ∈ N}.
39 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
Original Block Diagram
Encoder Channel DecoderSource Destination
Noise
Simplification’: Remove the Source Redundancy!
Encoder Channel DecoderSource Destination
Noisei.i.d. Bernoulli(1/2)i.e., Random Bits
It remains a highly non-trivial problem.
40 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
Channel Coding (Data Transmission)SourceEncoderSource
NoisyChannel
ChannelEncoder
Destination SourceDecoder
ChannelDecoder
Features of noisy channel:Noise: channel input sent by the channel encoder is corrupted by thenoise randomly, and produce the channel output.Uniform messages: input of channel encoders are assumed WLOG tobe bit sequences with no redundancy, since source coding alreadyremoves all redundancy and convert to bit sequences.
Goal: add minimum redundancy so that messages (bit sequences) can becommunicated over the noisy channel and decoded reliably.
41 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
Channel Coding (Data Transmission)SourceEncoderSource
NoisyChannel
ChannelEncoder
Destination SourceDecoder
ChannelDecoder
b[1], . . . , b[K]
bb[1], . . . ,bb[K]
x[1], . . . , x[N ]
y[1], . . . , y[N ]
p (y|x)
Notations:{x[1], . . . , x[N]} represent the codeword; each x[t] is called a “codedsymbol”.{b[1], . . . , b[K]} represent the message; each bit b[t] is called a “datasymbol (bit)”.{y[1], . . . , y[N]} represent the channel output.
42 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
Channel Coding (Data Transmission)SourceEncoderSource
NoisyChannel
ChannelEncoder
Destination SourceDecoder
ChannelDecoder
b[1], . . . , b[K]
bb[1], . . . ,bb[K]
x[1], . . . , x[N ]
y[1], . . . , y[N ]
p (y|x)
Question:For a given K (# of input bits), what is the minimum N (# of coded symbols)to recover b[1], . . . , b[K] at the decoder?It turns out that N = Θ(K). However, proving this is already non-trivial.
Shannon further ask: What is the maximum value of KN ?
43 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
Channel Coding (Data Transmission)SourceEncoderSource
NoisyChannel
ChannelEncoder
Destination SourceDecoder
ChannelDecoder
b[1], . . . , b[K]
bb[1], . . . ,bb[K]
x[1], . . . , x[N ]
y[1], . . . , y[N ]
p (y|x)
Shannon gave the necessary and sufficient condition for channel coding:
A Channel Coding TheoremThe (channel) decoder can decode the message reliably⇐⇒ code rate R := K
N < the channel capacity of the channel, C
We will define channel capacity later; it is a quantity that can becomputed by maximizing the “mutual information” between X and Y,which can be computed from the conditional distribution of the channel.
44 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
Summary
45 / 46 I-Hsiang Wang IT Lecture 1
Course InformationOverview
Information theory focuses on the quantitative aspects ofinformation, not the qualitative aspectsInformation theory is mainly about what is possible and what isimpossible in communication systemsIn information theory, one investigates problems in communicationsystems through the lens of probability theory and statisticsIn the course, we mainly focus on discrete-time signals, not oncontinuous-time signalsSource-channel separation forms the basis of digital communication,where binary digits (bits) become the universal currency ofinformation
46 / 46 I-Hsiang Wang IT Lecture 1