Understandingneuralrepresentationsinearlyvisualareas 798 ...tzhi/presentations/SfN2016_understanding_neural.pdfYimeng Zhanga, Corentin Massota, Tiancheng Zhib, George Papandreouc,

798.01 / Y5Understandingneural representations inearlyvisualareas

usingconvolutionalneuralnetworks

Yimeng Zhang

a

, Corentin Massot

a

, Tiancheng Zhi

b

, George Papandreou

c

, Alan Yuille

c

, Tai Sing Lee

a

aCenter for the Neural Basis of Cognition and Computer Science Department, Carnegie Mellon University, Pittsburgh, PA bPeking University, Beijing, China cUCLA, Los Angeles, CA

Motivations

• Convolutional neural networks (CNNs)

have feature representations like those

in higher layers of the primate and hu-

man visual cortex (Agrawal et al., 2014;

Khaligh-Razavi & Kriegeskorte, 2014;

Yamins et al., 2014).

• Recent data on V1 neurons suggested

that they may encode much more com-

plex features (see poster 798.03/Y7).

• CNN might be a useful tool to under-

stand the encoding of complex features

in lower layers (V1/V2) of visual cortex

as well.

Images and neural data

• 286 V1 and 390 V2 neurons in 2 mon-

keys to 150 stimuli using multi-electrode

arrays.

• The 150 stimuli have 3 subsets of 50,

Edge (E), Appearance (A), and Exem-

plar (EX), in levels of increasing com-

plexity.

0

4

8

12

16

20

0

5

10

15

20

0

4

8

12

16

20

EAEX

10 20 30 40 50

0

5

10

15

20

Ranked stimuli

Fir

ing

ra

te (

spk

/s)

25

stim

uli

Re

spo

nse

s

Edge Appearance Exemplar Tuning curves

Fir

ing

ra

te (

spk

/s)

a

b c

E A EX E A EX

• Around 3000 V1 neurons in 3 monkeys

to 2250 natural images, using calcium

imaging (see poster 798.03/Y7).

Computer vision models

normalize pool

conv1 norm1 pool1 pool5 fc6 fc7

6 otherlayers

pool

227 x 227 x 3 55 x 55 x 96 55 x 55 x 96 27 x 27 x 96 6 x

6 x

256

convolve + threshold⊗⊗⊗⊗

product + threshold× ×× ×

product + threshold× ×× ×

1 x

1 x

2048

1 x

1 x

2048

CNN “AlexNet”

V1like / V1likeSCconvolve

⊗⊗⊗⊗

threshold & saturate

normalize poolfilters for V1like filters for V1likeSC

Model comparison using Representational Similarity Analysis

• RSA (Kriegeskorte et al., 2008)

was used to compare model and

neural representationsmodel

,

area

(area can be V1, V2, etc.)

• Similarity between representa-

tions is defined as the Spearman’s

correlation coefficient of their rep-

resentational dissimilarity matri-

ces (RDMs), each of which cap-

tures pairwise distances between

images for a given representation.

50 im

ages

20 units 30 neurons

RDM(V1)RDM(model)

RDM(φ(x))ij = 1 ρ(φ(xi),φ(xj))

where denotes Pearson’s correlation.

V1(xi), i = 1, . . . , 50model(xi)

Results

E A EX

V1V2

Spea

rman

cor

rela

tion

with

neu

ral R

DM

model vs. neural data on the 150 stimuli set

V1V2

CNN vs. neural data on the 150 stimuli set, by layer

norm

alize

d si

mila

rity

CNN vs. neural data on the 2250 stimuli set, by layer

sim

ilarit

y V1

• Left: comparison of models on the 150 stimulus set. Top right: all CNN layers on the 150stimulus set. Bottom right: all CNN layers on the 2250 stimulus set.

• Horizontal lines estimates the achievable similarity by computing the similarities of feature repre-sentations among different monkeys. Similar to “explainable variance”.

• 150 set: CNN > V1likeSC > V1like, especially on complex stimuli (EX), and the best matchingCNN layer is stimulus dependent, simpler stimuli (E) best explained by lower layers, and complexstimuli (EX) by higher layers.

• 2250 set: CNN is far away from achievable similarity, suggesting missing constraints in CNN.• Higher layers in CNN can match well (even better) with V1/V2 than lower layers, suggesting

complex coding in V1/V2 neurons.

Neuron matching and visualization

visualization of some best matching units using deconvolution.

V2, pool2

0.00.10.20.30.40.5

0.00.10.20.30.40.5

KK

V1V2

conv

1

norm

1

pool

1

conv

2

norm

2

pool

2

conv

3

conv

4

conv

5

pool

5

fc6

fc7

norm

alize

d pr

opor

tion

datasetEAEX

V1, p

ool1 a, E b, E c, EX d, EX

Distributions of the layers of best matching CNN units for each individual neuron.

f, EX

e, E

• Single neuron matching results were consistent with RSA: V1 matched better to pool1, V2 to pool2.Complex stimuli (EX) shifted to higher layers compared to simple stimuli (E). V2 neurons werealso more correlated to higher layer CNN units than V1 neurons.

• While some neurons have visualizations consistent with the existing literature (a,b,c,e), someneurons preferred more complex features (d,f).

Why CNN performs better

• Network effects. Without normalization andpooling, V1like performed worse (not shown).

• Diverse filter shapes. V1likeSC and pool1 arebetter than V1like partially due to learned di-verse filters compared to Gabor ones in V1like.

• Network architecture might contribute as well.On the 2250 stimulus set, higher CNN layersperformed better than lower layers even withall network parameters being random.

Conclusion

• Some V1/V2 neurons may encode

more complex features than previously

thought.

• CNN is a good approximate model for

understanding and visualizing V1 and

V2 neurons.

• Future work: (1) add more biological

constraints into CNN models to make

CNN explain neural data better, (2) ex-

plore CNNs with heterogeneous layers,

each layer with units of different com-

plexities.

References

• Agrawal, P. et al. 2014, arXiv.org.• Khaligh-Razavi, S.-M. & Kriegeskorte, N. 2014, PLoS computa-

tional biology, 10, e1003915.• Kriegeskorte, N., Mur, M., & Bandettini, P. A. 2008, Frontiers in

Systems Neuroscience, 2.• Krizhevsky, A., Sutskever, I., & Hinton, G. E. 2012, in NIPS 25,

1097–1105.• Olshausen, B. A. 2013, in IS&T/SPIE Electronic Imaging.• Pinto, N., Cox, D. D., & DiCarlo, J. J. 2008, PLoS Comput Biol,

4, e27.• Yamins, D. L. K. et al. 2014, Proceedings of the National Academy

of Sciences, 111, 8619.

Acknowledgments

This research is supported by IARPA MICRONScontract #D16PC00007, and NIH R01 EY022247.

Thanks to Professor Shiming Tang for thecalcium imaging monkey data.

Contact: [email protected]

Documents

Understandingneuralrepresentationsinearlyvisualareas 798 ...tzhi/presentations/SfN2016_understanding_neural.pdfYimeng Zhanga, Corentin Massota, Tiancheng Zhib, George Papandreouc,