Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Efficient Coding of Natural Sounds
Michael Lewicki
Center for the Neural Basis of Cognition &Department of Computer Science
Carnegie Mellon University
Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡
➡
? 1
How does the brain encode complex sensory signals?
Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡
➡
? 2
Outline
Motivations
Efficient coding theory
Application to natural sounds
Interpretation of experimental data
Efficient coding in population spike codes
Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡
➡
? 3
A wing would be a most mystifying structureif one did not know that birds flew.
Horace Barlow, 1961
Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡
➡
? 4
Natural signals are redundant
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Correlation of adjacent pixels
Efficient coding hypothesis (Attneave, 1954; Barlow, 1961; et al):
Sensory systems encode only non-redundant structure
Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡
➡
? 5
Why code efficiently?
Information bottleneck of sensory coding:
• restrictions on information flow rate
– channel capacity of sensory nerves– computational bottleneck– 5× 106 → 40− 50 bits/sec
• facilitate pattern recognition
– independent features are more informative– better sensory codes could simply further processing
• other ideas
– efficient energy use– faster processing time
How do we use this hypothesis to predict sensory codes?
Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡
➡
? 6
A simple example: efficient coding of a single input
(from Atick, 1992)
How to set sensitivity?
• too high ⇒ response saturated• too low ⇒ range under utilized
• inputs follow distribution of sensoryenvironment
• encode so that output levels are usedwith equal frequency
• each response state has equal area(⇒ equal probability)
• continuum limit is cumulative pdf ofinput distribution
For y = g(c)
y
ymax=
∫ c
cmin
P (c′)dc′
Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡
➡
? 7
Testing the theory: Laugin, 1981
Laughlin, 1981:
• predict response of fly LMC (large monopolar cells)– interneuron in compound eye
• output is graded potential
• collect natural scenes to estimatestimulus pdf
• predict contrast response function⇒ fly LMC transmits informationefficiently
What about complex sensorypatterns?
Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡
➡
? 8
V1 receptive fields are consistent with effcient coding theory
V1 receptive fields are well-fit by 2D Gabor functions(Jones and Palmer, 1987).
Does this yield an efficient code?
Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡
➡
? 9
Coding images with pixels (Daugman, 1988)
Lena histogram of pixel valuesEntropy = 7.57
High entropy means high redundacny ⇒ a very inefficient code
Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡
➡
? 10
Recoding with Gabor functions (Daugman, 1988)
Pixel entropy= 7.57 bits Recoding with 2D Gabor functionsFilter output entropy = 2.55 bits.
Can these codes be predicted?
Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡
➡
? 11
Sparse coding of natural images (Olshausen and Field, 1996)
. . .
. . .
visual input units receptive fields
beforelearning
afterlearning
nature scene
. . .
Adapt population of receptive fields to
• accurately encode an ensembe of natural images• maximizing the sparseness of the output, i.e. minimizing entropy.
Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡
➡
? 12
Theory predicts entire population of receptive fields
(Lewicki and Olshausen, 1999)
Population of receptive fields.(black = inhibitory; white = exicitatory)
Overlayed response property schematics.
Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡
➡
? 13
Algorithm selects best of many possible sensory codesLearned
Gabor
Wavelet
Fourier
Haar
PCA
(Lewicki and Olshausen, 1999) Theoretical perspective:Not edge “detectors” but an efficient way to describe natural, complex images.
Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡
➡
? 14
Gabor wavelet Haar Fourier PCA learned0
1
2
3
4
5
6
7
8Comparing codes on natural images
estim
ated
bits
per
pix
el
Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡
➡
? 15
Efficient coding of natural sounds
Efficient coding: focus on coding waveform directly
Goal:
Predict optimal transformation of acoutsic waveformfrom statistics of the acoustic environment.
Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡
➡
? 17
Why encode sound by frequency?
Auditory tuning curves.
Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡
➡
? 18
A simple model of waveform encoding
Data consists of waveform segments sampled randomly from a sound ensemble:
x1:N
Filterbank model:
ai(t) =N−1∑τ=0
x(t− τ)hi(τ)
Model only describes signals within thewindow of analysis.
How do derive the filter shapes hi(t)that optimize coding efficiency?
Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡
➡
? 19
Information theoretic viewpoint
Use Shannon’s source coding theorm.
L = E[l(X)] ≥∑
x
p(x) log1
q(x)
=∑
x
p(x) logp(x)q(x)
+∑
x
p(x) log1
p(x)
= DKL(p‖q) + H(p)
If model density q(x) equals true density p(x) then DKL = 0.⇒ q(x) gives lower bound on average code length.
greater coding efficiency ⇔ more learned structure
Principle
Good codes capture the statistical distribution of sensory patterns.
How do we descibe the distribution?
Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡
➡
? 20
Describing signals with a simple statistical model
Goal is to encode the data to desired precision
x = ~a1s1 + ~a2s2 + · · ·+ ~aLsL + ~ε
= As + ε
Can solve for s in the no noise case
s = A−1x
Want algorithm to choose optimal A (i.e. the basis matrix).
Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡
➡
? 21
Algorithm for deriving efficient codes
Learning objective:
maximize coding efficiency
⇒ maximize P (x|A) over A (basis foranalysis window, or filter shapes).
Probability of pattern ensemble is:
P (x1,x2, ...,xN |A) =∏k
P (xk|A)
To obtain P (x|A) marginalize over s:
P (x|A) =∫
dsP (x|A, s)P (s)
=P (s)|detA|
Using independent component analysis(ICA) to optimize A:
∆A ∝ AAT ∂
∂Alog P (x|A)
= −A(zsT − I) ,
where z = (log P (s))′. UseP (si) ∼ ExPwr(si|µ, σ, βi).
This learning rule:
• learns features that capture the moststructure
• optimizes the efficiency of the code
Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡
➡
? 22
Modeling Non-Gaussian distributions with ICA
−6 −4 −2 0 2 4 6
β=2
s1
−6 −4 −2 0 2 4 6
β=4
s2
• Typical coeff. distributions ofnatural signals arenon-Gaussian.
• Independent componentanalysis (ICA) describes thestatistical distribution ofnon-Gaussian distributions
• The distribution is fit byoptimizing the filter shapes.
• Unlike PCA, vectors are notrestricted to be orthogonal.
• This permits a much betterdescription of the actualdistribution of natural signals.
Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡
➡
? 23
Modeling Non-Gaussian distributions with ICA
−6 −4 −2 0 2 4 6
β=2
s1
−6 −4 −2 0 2 4 6
β=4
s2
• Typical coeff. distributions ofnatural signals arenon-Gaussian.
• Independent componentanalysis (ICA) describes thestatistical distribution ofnon-Gaussian distributions
• The distribution is fit byoptimizing the filter shapes.
• Unlike PCA, vectors are notrestricted to be orthogonal.
• This permits a much betterdescription of the actualdistribution of natural signals.
Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡
➡
? 24
Efficient coding of natural sounds: Learning procedure
To derive the filters:
• select sound segments randomly from sound ensemble
• optimize filter shapes to maximize coding efficiency
What sounds should we use?
What are auditory systems adapted for?
• localization / environmental sounds?
• communication / vocalizations?
• specific tasks, e.g sounddiscrimination?
We used the following sound ensembles:
• non-harmonic environmental sounds(e.g. footsteps, stream sounds, etc.)
• animal vocalizations (rainforestmammals, e.g chirping, screeching,cries, etc.)
• speech (samples from 100 male andfemale speakers from the TIMITcorpus)
Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡
➡
? 25
Results of adapting filters to different sound classes
Efficient filters for environmental sounds:
Efficient filters for speech:
Efficient filters for animal vocalizations:
• Each result shows only a subset
• Auditory nerve filters best matchthose derived from environmentalsounds and speech
• learning movie
Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡
➡
? 26
Upsampling removings aliasing due to periodic sampling
Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡
➡
? 27
A combined ensemble: env. sounds and vocalizations
Efficient filters for combined Efficient filters for speech:
Can vary along the continuum by changing relative proportion, best match is 2:1⇒ speech is well-matched to the auditory code
Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡
➡
? 28
Can decorrelating models also explain data?
Redundancy reduction models that adapt weights to decorrelate output activiesassume a Gaussian model:
x ∼ N (x|µ, σ)
Under this model, the filters can be derived with principal component analysis.
PCs of Environmental Sounds: Corresponding Power Spectra:
⇒ just decorrelating the outputs doesnot yield time-frequency localized filters.
Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡
➡
? 29
Why doesn’t PCA work?
Check assumptions:
x = As and x ∼ N (x|µ, σ)
⇒ distribution of s should also be Gaussian.
Actual distribution of filter coefficients:
Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡
➡
? 30
Efficient coding of sparse noise
Learned sparse noise filters:
Efficient filters are delta functions that representdifferent time points in the analysis window.
...but what about the auditory system?
Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡
➡
? 31
Auditory filters estimated by reverse correlation
x(t) y(t) z(t) s(t)Linear
System
static
nonlinearity
stochastic
pulse gen.
Cat auditory “revcor” filters:
time (ms)20151051
1 2 3 4 5 6 7 8 9 10
time (ms)
54321
time (ms)
3 5421
time (ms)
deBoer and deJongh, 1978 Carney and Yin, 1988
Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡
➡
? 32
Revcor filter predictions of auditory nerve response
(from de Boer and de Jongh, 1978).
• stimulus is white noise
• histogram: measuredauditory nerve response
• smooth curve:predicted response
Conclusion:
Shape anddistribution of revcorfilters account for alarge part of theauditory sensory code.
We want to match morethan just individual filters:
How do wecharacterize thepopulation?
Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡
➡
? 33
0 2 4 6 8time (ms)
0 2 4 6 8time (ms)
0.5 1 2 5 10−40
−30
−20
−10
0
frequency (kHz)
0 2 4 6 8time (ms)
0.5 1 2 5 10−40
−30
−20
−10
0
frequency (kHz)0.5 1 2 5 10
−40
−30
−20
−10
0
frequency (kHz)
0.5 1 2 5 10−40
−30
−20
−10
0
frequency (kHz)0.5 1 2 5 10
−40
−30
−20
−10
0
frequency (kHz)0.5 1 2 5 10
−40
−30
−20
−10
0
frequency (kHz)
0 2 4 6 8time (ms)
0 2 4 6 8time (ms)
0 2 4 6 8time (ms)
0 2 4 6 8time (ms)
0 2 4 6 8time (ms)
0.5 1 2 5 10−40
−30
−20
−10
0
frequency (kHz)0.5 1 2 5 10
−40
−30
−20
−10
0
frequency (kHz)
0 2 4 6 8time (ms)
0.5 1 2 5 10−40
−30
−20
−10
0
frequency (kHz)
vocalizations
mag
nitu
de (
dB)
speech
mag
nitu
de (
dB)
mag
nitu
de (
dB)
env. sounds
Schematic time-frequency distributions
time time
Fourier typical wavelet
freq
uenc
y
Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡
➡
? 35
Animal vocalizations:
0 2 4 6 80
1
2
3
4
5
6
7
time (ms)
freq
uenc
y (k
Hz)
Speech:
0 2 4 6 80
1
2
3
4
5
6
7
8
time (ms)
freq
uenc
y (k
Hz)
Environmental sounds:
0 2 4 6 80
1
2
3
4
5
6
7
time (ms)
freq
uenc
y (k
Hz)
Tiling trends follow power law
0.5 1 2 5 10
0.2
0.5
1
2
center frequency (kHz)
band
wid
th (
kHz)
0.5 1 2 5 10
0.5
1
2
4
8
center frequency (kHz)
tem
pora
l env
elop
e (m
s)’×’ = environmental sounds ’◦’ = speech ’+’ = vocalizations
Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡
➡
? 37
Does equalization of power explain these data?
Average power spectra:
0.5 1 2 5 10−30
−25
−20
−15
−10
−5
0
5
es
frequency (kHz)
mea
n po
wer
(dB
) voc
sp
Equal power across frequency bands:
0.5 1 2 5 10
0.1
0.2
0.5
1
2
5
es
frequency (kHz)
band
wid
th (
kHz)
voc
sp
Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡
➡
? 38
Comparison to auditory population code
Cat auditory nerves
0.2 0.5 1 2 5 10 20
1
2
5
10
20
characteristic frequency (kHz)
Q10
dB
Evans, 1975Rhode and Smith, 1985
Filter sharpness characterizes howbandwidth changes as a function offrequency
Q10dB = fc/w10dB
Derived filters
0.2 0.5 1 2 5 10 20
1
2
5
10
20
center frequency (kHz)
Q10
dB’+’ vocalizations’◦’ speech’×’ environmental sounds
Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡
➡
? 39
Summary
Information theory and efficient coding:
• can be used to derive optimal codes for different pattern classes.
• explains important properties of sensory codes in both the auditory and visualsystem.
• gives insight into how our sensory systems are adapted to the naturalenvironment.
Caveats
• Codes can only be derived within a small window
• Does not explain non-linear aspects of coding
• Models do not capture higher order structure
Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡
➡
? 40
Coding natural sounds with spikes
Addressing some limitations of the current theory
The current model assumes the sound waveform is dividing into blocks:
time
ampl
itude
Problems with block coding:
• signal structure is arbitrarily aligned• code depends on block alignment• difficult to encode non-periodic structure, e.g. rapid onsets
Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡
➡
? 42
An efficient, shift-invariant model
The signal is modeled by a sum of events plus noise:
x(t) = s1φ1(t− τ1) + · · ·+ sMφM(t− τM) + ε(t) .
The events φm(t):
• can be placed at arbitrary time points τm
• are scaled by coefficients sm
Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡
➡
? 43
Solution after optimization: 105 dB SNR
20 40 60 80 100 120
0
10
20
30
40
50
60
unit
inde
x
20 40 60 80 100 120time
x(t)
Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡
➡
? 44
Time shifting
20 40 60 80 100 120
0
10
20
30
40
50
60
unit
inde
x
20 40 60 80 100 120time
x(t)
Michael S. Lewicki, Carnegie Mellon University, Oct, 21 2002 ➡➡ ➡
➡
? 45