Efficient Coding of Natural Sounds€¦ · learning after learning nature scene. . . Adapt...

Preview:

Citation preview

Efficient Coding of Natural Sounds

Michael Lewicki

Center for the Neural Basis of Cognition &Department of Computer Science

Carnegie Mellon University

Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡

? 1

How does the brain encode complex sensory signals?

Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡

? 2

Outline

Motivations

Efficient coding theory

Application to natural sounds

Interpretation of experimental data

Efficient coding in population spike codes

Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡

? 3

A wing would be a most mystifying structureif one did not know that birds flew.

Horace Barlow, 1961

Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡

? 4

Natural signals are redundant

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1Correlation of adjacent pixels

Efficient coding hypothesis (Attneave, 1954; Barlow, 1961; et al):

Sensory systems encode only non-redundant structure

Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡

? 5

Why code efficiently?

Information bottleneck of sensory coding:

• restrictions on information flow rate

– channel capacity of sensory nerves– computational bottleneck– 5× 106 → 40− 50 bits/sec

• facilitate pattern recognition

– independent features are more informative– better sensory codes could simply further processing

• other ideas

– efficient energy use– faster processing time

How do we use this hypothesis to predict sensory codes?

Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡

? 6

A simple example: efficient coding of a single input

(from Atick, 1992)

How to set sensitivity?

• too high ⇒ response saturated• too low ⇒ range under utilized

• inputs follow distribution of sensoryenvironment

• encode so that output levels are usedwith equal frequency

• each response state has equal area(⇒ equal probability)

• continuum limit is cumulative pdf ofinput distribution

For y = g(c)

y

ymax=

∫ c

cmin

P (c′)dc′

Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡

? 7

Testing the theory: Laugin, 1981

Laughlin, 1981:

• predict response of fly LMC (large monopolar cells)– interneuron in compound eye

• output is graded potential

• collect natural scenes to estimatestimulus pdf

• predict contrast response function⇒ fly LMC transmits informationefficiently

What about complex sensorypatterns?

Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡

? 8

V1 receptive fields are consistent with effcient coding theory

V1 receptive fields are well-fit by 2D Gabor functions(Jones and Palmer, 1987).

Does this yield an efficient code?

Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡

? 9

Coding images with pixels (Daugman, 1988)

Lena histogram of pixel valuesEntropy = 7.57

High entropy means high redundacny ⇒ a very inefficient code

Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡

? 10

Recoding with Gabor functions (Daugman, 1988)

Pixel entropy= 7.57 bits Recoding with 2D Gabor functionsFilter output entropy = 2.55 bits.

Can these codes be predicted?

Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡

? 11

Sparse coding of natural images (Olshausen and Field, 1996)

. . .

. . .

visual input units receptive fields

beforelearning

afterlearning

nature scene

. . .

Adapt population of receptive fields to

• accurately encode an ensembe of natural images• maximizing the sparseness of the output, i.e. minimizing entropy.

Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡

? 12

Theory predicts entire population of receptive fields

(Lewicki and Olshausen, 1999)

Population of receptive fields.(black = inhibitory; white = exicitatory)

Overlayed response property schematics.

Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡

? 13

Algorithm selects best of many possible sensory codesLearned

Gabor

Wavelet

Fourier

Haar

PCA

(Lewicki and Olshausen, 1999) Theoretical perspective:Not edge “detectors” but an efficient way to describe natural, complex images.

Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡

? 14

Gabor wavelet Haar Fourier PCA learned0

1

2

3

4

5

6

7

8Comparing codes on natural images

estim

ated

bits

per

pix

el

Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡

? 15

Efficient coding of natural sounds

Efficient coding: focus on coding waveform directly

Goal:

Predict optimal transformation of acoutsic waveformfrom statistics of the acoustic environment.

Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡

? 17

Why encode sound by frequency?

Auditory tuning curves.

Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡

? 18

A simple model of waveform encoding

Data consists of waveform segments sampled randomly from a sound ensemble:

x1:N

Filterbank model:

ai(t) =N−1∑τ=0

x(t− τ)hi(τ)

Model only describes signals within thewindow of analysis.

How do derive the filter shapes hi(t)that optimize coding efficiency?

Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡

? 19

Information theoretic viewpoint

Use Shannon’s source coding theorm.

L = E[l(X)] ≥∑

x

p(x) log1

q(x)

=∑

x

p(x) logp(x)q(x)

+∑

x

p(x) log1

p(x)

= DKL(p‖q) + H(p)

If model density q(x) equals true density p(x) then DKL = 0.⇒ q(x) gives lower bound on average code length.

greater coding efficiency ⇔ more learned structure

Principle

Good codes capture the statistical distribution of sensory patterns.

How do we descibe the distribution?

Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡

? 20

Describing signals with a simple statistical model

Goal is to encode the data to desired precision

x = ~a1s1 + ~a2s2 + · · ·+ ~aLsL + ~ε

= As + ε

Can solve for s in the no noise case

s = A−1x

Want algorithm to choose optimal A (i.e. the basis matrix).

Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡

? 21

Algorithm for deriving efficient codes

Learning objective:

maximize coding efficiency

⇒ maximize P (x|A) over A (basis foranalysis window, or filter shapes).

Probability of pattern ensemble is:

P (x1,x2, ...,xN |A) =∏k

P (xk|A)

To obtain P (x|A) marginalize over s:

P (x|A) =∫

dsP (x|A, s)P (s)

=P (s)|detA|

Using independent component analysis(ICA) to optimize A:

∆A ∝ AAT ∂

∂Alog P (x|A)

= −A(zsT − I) ,

where z = (log P (s))′. UseP (si) ∼ ExPwr(si|µ, σ, βi).

This learning rule:

• learns features that capture the moststructure

• optimizes the efficiency of the code

Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡

? 22

Modeling Non-Gaussian distributions with ICA

−6 −4 −2 0 2 4 6

β=2

s1

−6 −4 −2 0 2 4 6

β=4

s2

• Typical coeff. distributions ofnatural signals arenon-Gaussian.

• Independent componentanalysis (ICA) describes thestatistical distribution ofnon-Gaussian distributions

• The distribution is fit byoptimizing the filter shapes.

• Unlike PCA, vectors are notrestricted to be orthogonal.

• This permits a much betterdescription of the actualdistribution of natural signals.

Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡

? 23

Modeling Non-Gaussian distributions with ICA

−6 −4 −2 0 2 4 6

β=2

s1

−6 −4 −2 0 2 4 6

β=4

s2

• Typical coeff. distributions ofnatural signals arenon-Gaussian.

• Independent componentanalysis (ICA) describes thestatistical distribution ofnon-Gaussian distributions

• The distribution is fit byoptimizing the filter shapes.

• Unlike PCA, vectors are notrestricted to be orthogonal.

• This permits a much betterdescription of the actualdistribution of natural signals.

Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡

? 24

Efficient coding of natural sounds: Learning procedure

To derive the filters:

• select sound segments randomly from sound ensemble

• optimize filter shapes to maximize coding efficiency

What sounds should we use?

What are auditory systems adapted for?

• localization / environmental sounds?

• communication / vocalizations?

• specific tasks, e.g sounddiscrimination?

We used the following sound ensembles:

• non-harmonic environmental sounds(e.g. footsteps, stream sounds, etc.)

• animal vocalizations (rainforestmammals, e.g chirping, screeching,cries, etc.)

• speech (samples from 100 male andfemale speakers from the TIMITcorpus)

Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡

? 25

Results of adapting filters to different sound classes

Efficient filters for environmental sounds:

Efficient filters for speech:

Efficient filters for animal vocalizations:

• Each result shows only a subset

• Auditory nerve filters best matchthose derived from environmentalsounds and speech

• learning movie

Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡

? 26

Upsampling removings aliasing due to periodic sampling

Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡

? 27

A combined ensemble: env. sounds and vocalizations

Efficient filters for combined Efficient filters for speech:

Can vary along the continuum by changing relative proportion, best match is 2:1⇒ speech is well-matched to the auditory code

Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡

? 28

Can decorrelating models also explain data?

Redundancy reduction models that adapt weights to decorrelate output activiesassume a Gaussian model:

x ∼ N (x|µ, σ)

Under this model, the filters can be derived with principal component analysis.

PCs of Environmental Sounds: Corresponding Power Spectra:

⇒ just decorrelating the outputs doesnot yield time-frequency localized filters.

Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡

? 29

Why doesn’t PCA work?

Check assumptions:

x = As and x ∼ N (x|µ, σ)

⇒ distribution of s should also be Gaussian.

Actual distribution of filter coefficients:

Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡

? 30

Efficient coding of sparse noise

Learned sparse noise filters:

Efficient filters are delta functions that representdifferent time points in the analysis window.

...but what about the auditory system?

Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡

? 31

Auditory filters estimated by reverse correlation

x(t) y(t) z(t) s(t)Linear

System

static

nonlinearity

stochastic

pulse gen.

Cat auditory “revcor” filters:

time (ms)20151051

1 2 3 4 5 6 7 8 9 10

time (ms)

54321

time (ms)

3 5421

time (ms)

deBoer and deJongh, 1978 Carney and Yin, 1988

Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡

? 32

Revcor filter predictions of auditory nerve response

(from de Boer and de Jongh, 1978).

• stimulus is white noise

• histogram: measuredauditory nerve response

• smooth curve:predicted response

Conclusion:

Shape anddistribution of revcorfilters account for alarge part of theauditory sensory code.

We want to match morethan just individual filters:

How do wecharacterize thepopulation?

Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡

? 33

0 2 4 6 8time (ms)

0 2 4 6 8time (ms)

0.5 1 2 5 10−40

−30

−20

−10

0

frequency (kHz)

0 2 4 6 8time (ms)

0.5 1 2 5 10−40

−30

−20

−10

0

frequency (kHz)0.5 1 2 5 10

−40

−30

−20

−10

0

frequency (kHz)

0.5 1 2 5 10−40

−30

−20

−10

0

frequency (kHz)0.5 1 2 5 10

−40

−30

−20

−10

0

frequency (kHz)0.5 1 2 5 10

−40

−30

−20

−10

0

frequency (kHz)

0 2 4 6 8time (ms)

0 2 4 6 8time (ms)

0 2 4 6 8time (ms)

0 2 4 6 8time (ms)

0 2 4 6 8time (ms)

0.5 1 2 5 10−40

−30

−20

−10

0

frequency (kHz)0.5 1 2 5 10

−40

−30

−20

−10

0

frequency (kHz)

0 2 4 6 8time (ms)

0.5 1 2 5 10−40

−30

−20

−10

0

frequency (kHz)

vocalizations

mag

nitu

de (

dB)

speech

mag

nitu

de (

dB)

mag

nitu

de (

dB)

env. sounds

Schematic time-frequency distributions

time time

Fourier typical wavelet

freq

uenc

y

Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡

? 35

Animal vocalizations:

0 2 4 6 80

1

2

3

4

5

6

7

time (ms)

freq

uenc

y (k

Hz)

Speech:

0 2 4 6 80

1

2

3

4

5

6

7

8

time (ms)

freq

uenc

y (k

Hz)

Environmental sounds:

0 2 4 6 80

1

2

3

4

5

6

7

time (ms)

freq

uenc

y (k

Hz)

Tiling trends follow power law

0.5 1 2 5 10

0.2

0.5

1

2

center frequency (kHz)

band

wid

th (

kHz)

0.5 1 2 5 10

0.5

1

2

4

8

center frequency (kHz)

tem

pora

l env

elop

e (m

s)’×’ = environmental sounds ’◦’ = speech ’+’ = vocalizations

Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡

? 37

Does equalization of power explain these data?

Average power spectra:

0.5 1 2 5 10−30

−25

−20

−15

−10

−5

0

5

es

frequency (kHz)

mea

n po

wer

(dB

) voc

sp

Equal power across frequency bands:

0.5 1 2 5 10

0.1

0.2

0.5

1

2

5

es

frequency (kHz)

band

wid

th (

kHz)

voc

sp

Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡

? 38

Comparison to auditory population code

Cat auditory nerves

0.2 0.5 1 2 5 10 20

1

2

5

10

20

characteristic frequency (kHz)

Q10

dB

Evans, 1975Rhode and Smith, 1985

Filter sharpness characterizes howbandwidth changes as a function offrequency

Q10dB = fc/w10dB

Derived filters

0.2 0.5 1 2 5 10 20

1

2

5

10

20

center frequency (kHz)

Q10

dB’+’ vocalizations’◦’ speech’×’ environmental sounds

Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡

? 39

Summary

Information theory and efficient coding:

• can be used to derive optimal codes for different pattern classes.

• explains important properties of sensory codes in both the auditory and visualsystem.

• gives insight into how our sensory systems are adapted to the naturalenvironment.

Caveats

• Codes can only be derived within a small window

• Does not explain non-linear aspects of coding

• Models do not capture higher order structure

Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡

? 40

Coding natural sounds with spikes

Addressing some limitations of the current theory

The current model assumes the sound waveform is dividing into blocks:

time

ampl

itude

Problems with block coding:

• signal structure is arbitrarily aligned• code depends on block alignment• difficult to encode non-periodic structure, e.g. rapid onsets

Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡

? 42

An efficient, shift-invariant model

The signal is modeled by a sum of events plus noise:

x(t) = s1φ1(t− τ1) + · · ·+ sMφM(t− τM) + ε(t) .

The events φm(t):

• can be placed at arbitrary time points τm

• are scaled by coefficients sm

Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡

? 43

Solution after optimization: 105 dB SNR

20 40 60 80 100 120

0

10

20

30

40

50

60

unit

inde

x

20 40 60 80 100 120time

x(t)

Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡

? 44

Time shifting

20 40 60 80 100 120

0

10

20

30

40

50

60

unit

inde

x

20 40 60 80 100 120time

x(t)

Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡

? 45

Recommended