37
ORIE 4742 - Info Theory and Bayesian ML February 8, 2021 Semester: Spring 2021

ORIE 4742 - Info Theory and Bayesian ML

  • Upload
    others

  • View
    7

  • Download
    1

Embed Size (px)

Citation preview

Page 1: ORIE 4742 - Info Theory and Bayesian ML

ORIE 4742 - Info Theory and Bayesian ML

February 8, 2021

Semester: Spring 2021

Page 2: ORIE 4742 - Info Theory and Bayesian ML

essential course information

- instructor: Sid Banerjee, [email protected]

- TA: Spencer Peters, [email protected]

- lectures: MW 11:25am-12:40pm, Mann 107

- Zoom link

https://cornell.zoom.us/j/93025504345 (pwd: Shannon)

- website

https://piazza.com/cornell/spring2021/orie4742

Page 3: ORIE 4742 - Info Theory and Bayesian ML

the fine print

- grading

50% homeworks, 20% prelim, 25% project,

5% participation

- homeworks

5-6 homeworks (on average 2 weeks for each)

teams of up to 3

submit single Jupyter notebook, with theory answers in Markdown

submissions on https://cmsx.cs.cornell.edu

4 late days across homeworks, lowest grade dropped

- prelim

in class, tentatively March 29

no final exam

- project

use techniques learned in class on problem of your choosing

teams of up to 3, report due before finals

Page 4: ORIE 4742 - Info Theory and Bayesian ML

what is this class about

• Q1. given data, how can we learn how it was generated?

• Q2. how can we translate data and models into future decisions?

• Q3. what are the fundamental limits and design principles of

data-driven learning and decision-making

our approach in this course: probabilistic modeling

– bayesian inference: unified paradigm for learning and decision-making

– information theory: tool for designing and understanding data systems

Page 5: ORIE 4742 - Info Theory and Bayesian ML

what is this class about

• Q1. given data, how can we learn how it was generated?

• Q2. how can we translate data and models into future decisions?

• Q3. what are the fundamental limits and design principles of

data-driven learning and decision-making

our approach in this course: probabilistic modeling

– bayesian inference: unified paradigm for learning and decision-making

– information theory: tool for designing and understanding data systems

Page 6: ORIE 4742 - Info Theory and Bayesian ML

what is this class about

• Q1. given data, how can we learn how it was generated?

• Q2. how can we translate data and models into future decisions?

• Q3. what are the fundamental limits and design principles of

data-driven learning and decision-making

our approach in this course: probabilistic modeling

– bayesian inference: unified paradigm for learning and decision-making

– information theory: tool for designing and understanding data systems

Page 7: ORIE 4742 - Info Theory and Bayesian ML

what is this class about

• Q1. given data, how can we learn how it was generated?

• Q2. how can we translate data and models into future decisions?

• Q3. what are the fundamental limits and design principles of

data-driven learning and decision-making

our approach in this course: probabilistic modeling

– bayesian inference: unified paradigm for learning and decision-making

– information theory: tool for designing and understanding data systems

Page 8: ORIE 4742 - Info Theory and Bayesian ML

problem: communicating over a noisy channel

reading assignment: chapter 1 of Mackay

Page 9: ORIE 4742 - Info Theory and Bayesian ML

communicating over channels

Page 10: ORIE 4742 - Info Theory and Bayesian ML

the system’s solution

Page 11: ORIE 4742 - Info Theory and Bayesian ML

a toy model: the binary symmetric channel

(1� f)

(1� f)

f

-

-

���

�R

1

0

1

0

credit: David Mackay

Page 12: ORIE 4742 - Info Theory and Bayesian ML

ideas for encoding

Page 13: ORIE 4742 - Info Theory and Bayesian ML

repetition codes: encoding

credit: David Mackay

Page 14: ORIE 4742 - Info Theory and Bayesian ML

repetition codes: decoding

credit: David Mackay

Page 15: ORIE 4742 - Info Theory and Bayesian ML

repetition codes: inference

Page 16: ORIE 4742 - Info Theory and Bayesian ML

repetition codes: performance

Page 17: ORIE 4742 - Info Theory and Bayesian ML

repetition codes: the rate-error plot

0

0.02

0.04

0.06

0.08

0.1

0 0.2 0.4 0.6 0.8 1

Rate

more useful codesR5

R3

R61

R1

p

b

0.1

0.01

1e-05

1e-10

1e-15

0 0.2 0.4 0.6 0.8 1

Rate

more useful codes

R5R3

R61

R1

credit: David Mackay

Page 18: ORIE 4742 - Info Theory and Bayesian ML

the (7,4) Hamming code

(a)

s

ss

t t

t

7 6

5

4s

3

21

(b)

1 0

0

0

1

01

credit: David Mackay

Page 19: ORIE 4742 - Info Theory and Bayesian ML

the (7,4) Hamming code: performance

credit: David Mackay

distance between codewords

the minimal Hamming distance between any two correct codewords is 3

Page 20: ORIE 4742 - Info Theory and Bayesian ML

the (7,4) Hamming code: performance

credit: David Mackay

distance between codewords

the minimal Hamming distance between any two correct codewords is 3

Page 21: ORIE 4742 - Info Theory and Bayesian ML

the rate-error plot

0

0.02

0.04

0.06

0.08

0.1

0 0.2 0.4 0.6 0.8 1

Rate

H(7,4)

more useful codesR5

R3BCH(31,16)

R1

BCH(15,7)

p

b

0.1

0.01

1e-05

1e-10

1e-15

0 0.2 0.4 0.6 0.8 1

Rate

H(7,4)

more useful codes

R5

BCH(511,76)

BCH(1023,101)

R1

credit: David Mackay

Page 22: ORIE 4742 - Info Theory and Bayesian ML

Shannon’s channel coding theorem

0

0.02

0.04

0.06

0.08

0.1

0 0.2 0.4 0.6 0.8 1

Rate

not achievable

H(7,4)

achievableR5

R3

R1

C

p

b

0.1

0.01

1e-05

1e-10

1e-15

0 0.2 0.4 0.6 0.8 1

Rate

not achievableachievable

R5 R1

C

Theorem (Claude Shannon, 1948)

for any channel, 0-error communication is possible at a rate up to C > 0

Page 23: ORIE 4742 - Info Theory and Bayesian ML

noisy channel communication ↔ machine learning

Page 24: ORIE 4742 - Info Theory and Bayesian ML

redundancy ⇒ inference

credit: David Mackay

Page 25: ORIE 4742 - Info Theory and Bayesian ML

redundancy ⇒ inference

credit: David Mackay

Page 26: ORIE 4742 - Info Theory and Bayesian ML

the noisy channel model in ML

Page 27: ORIE 4742 - Info Theory and Bayesian ML

noisy channels in ML

credit: MNIST dataset

Page 28: ORIE 4742 - Info Theory and Bayesian ML

we are inherently bayesian

credit: quantamagazine.org, original image by Edward Adelson

“Tile A looks darker than tile B, though they are both the same shade

(connecting the squares makes this clearer). The brain uses coloring of

nearby tiles and location of the shadow to make inferences about the tile

colors. . . lead to the perception that A and B are shaded differently.”

Page 29: ORIE 4742 - Info Theory and Bayesian ML

what we hope to cover

– bayesian inference: unified paradigm for learning and decision-making

– information theory: tool for designing and understanding data systems

• basic probability review, and introducing information measures

• the source and channel coding theorems

• bayesian inference: priors, bayesian update, model selection

• bayesian classification and regression

• complex models: gaussian processes, artificial neural networks

• graphical models and markov random fields

• approximate inference: MCMC

• model-based decision-making: bayesian optimization, causal inference,

sequential decision-making and reinforcement learning

Page 30: ORIE 4742 - Info Theory and Bayesian ML

what we hope to cover

– bayesian inference: unified paradigm for learning and decision-making

– information theory: tool for designing and understanding data systems

• basic probability review, and introducing information measures

• the source and channel coding theorems

• bayesian inference: priors, bayesian update, model selection

• bayesian classification and regression

• complex models: gaussian processes, artificial neural networks

• graphical models and markov random fields

• approximate inference: MCMC

• model-based decision-making: bayesian optimization, causal inference,

sequential decision-making and reinforcement learning

Page 31: ORIE 4742 - Info Theory and Bayesian ML

what we hope to cover

– bayesian inference: unified paradigm for learning and decision-making

– information theory: tool for designing and understanding data systems

• basic probability review, and introducing information measures

• the source and channel coding theorems

• bayesian inference: priors, bayesian update, model selection

• bayesian classification and regression

• complex models: gaussian processes, artificial neural networks

• graphical models and markov random fields

• approximate inference: MCMC

• model-based decision-making: bayesian optimization, causal inference,

sequential decision-making and reinforcement learning

Page 32: ORIE 4742 - Info Theory and Bayesian ML

what we hope to cover

– bayesian inference: unified paradigm for learning and decision-making

– information theory: tool for designing and understanding data systems

• basic probability review, and introducing information measures

• the source and channel coding theorems

• bayesian inference: priors, bayesian update, model selection

• bayesian classification and regression

• complex models: gaussian processes, artificial neural networks

• graphical models and markov random fields

• approximate inference: MCMC

• model-based decision-making: bayesian optimization, causal inference,

sequential decision-making and reinforcement learning

Page 33: ORIE 4742 - Info Theory and Bayesian ML

what we hope to cover

– bayesian inference: unified paradigm for learning and decision-making

– information theory: tool for designing and understanding data systems

• basic probability review, and introducing information measures

• the source and channel coding theorems

• bayesian inference: priors, bayesian update, model selection

• bayesian classification and regression

• complex models: gaussian processes, artificial neural networks

• graphical models and markov random fields

• approximate inference: MCMC

• model-based decision-making: bayesian optimization, causal inference,

sequential decision-making and reinforcement learning

Page 35: ORIE 4742 - Info Theory and Bayesian ML

aids in getting excited about learning

the following help understand the larger context of what we will study

Page 36: ORIE 4742 - Info Theory and Bayesian ML

is this course right for you?

• prerequisites:

• linear algebra, calculus

• probability: ideally at the level of ORIE 3500

• programming: python

• caveat emptor:

- may not be ideal as a first course in ML

- we will focus on Bayesian methods, and ignore alternate

‘frequentist’ methods

- will involve a fair bit of additional reading and programming, and

some ‘Bayesian philosophy’

Page 37: ORIE 4742 - Info Theory and Bayesian ML

is this course right for you?

• prerequisites:

• linear algebra, calculus

• probability: ideally at the level of ORIE 3500

• programming: python

• caveat emptor:

- may not be ideal as a first course in ML

- we will focus on Bayesian methods, and ignore alternate

‘frequentist’ methods

- will involve a fair bit of additional reading and programming, and

some ‘Bayesian philosophy’