1
Question: The search for object categories, categorical search, is characterized by a subordinate-level advantage in target guidance (e.g., time to target fixation) and a basic-level advantage in target verification. What are the underlying features that make these effects possible? Exp #1: Previous work ([16]) modeled categorical guidance and verification over three levels of a hierarchy (subordinate, basic, and superordinate) using category-consistent features built from SIFT and Bag-of-Words. How would fixation prediction using these biologically-uninspired features compare to those learned by a CNN of the ventral visual stream? Exp #2: How do fixation predictions using our Ventral-Stream Network (VsNet) model compare to AlexNet and Deep-HMAX DCCN models? Generating the Features for Category Representation using a Deep Convolutional Neural Network 1 Dept of Computer Science, 2 Dept of Psychology, Stony Brook University Chen-Ping Yu 1 ([email protected]), Justin Maxfield 2 , and Gregory J. Zelinsky 1,2 References & Acknowledgments [1] A. Krizhevsky, I. Sutskever, and G. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. NIPS, 2012. [2] A. Krizhevsky. One Weird Trick for Parallelizing Convolutional Neural Networks. Arxiv, 2014. [3] T. Serre, A. Oliva, and T. Poggio. A Feedforward Architecture Accounts for Rapid Categorization. PNAS, 2007. [4] D. Kravitz, K. Saleem, C. Baker, L. Ungerleider, and M. Mishkin. The Ventral Visual PathWay: An Expanded Neural Framework for the Processing of Object Quality. Trends in Cognitive Sciences, 2013. [5] D. Boussaoud, R. Desimone, and L. Ungerleider. Visual Topography of Area TEO in the Macaque. Journal of Comparative Neurology, 1991. [6] S. Kastner, P. Weerd, M. Pinsk, M. Elizondo, R. Desimone, and L. Ungerleider. Modulation of Sensory Suppression: Implications for Receptive Field Sizes in the Human Visual Cortex. Journal of Neurophysiology, 2001. [7] A. Smith, A. Williams, and M. Greenlee. Estimating Receptive Field Size from fMRI Data in Human Striate and Extrastriate Visual Cortex. Cerebral Cortex, 2001. [8] G. Rousselet, S. Thorpe, and M. Fabre-Thorpe. How Parallel is Visual Processing in the Ventral Pathway? Trends in Cognitive Sciences, 2004. [9] B. Harvey and S. Dumoulin. The Relationship between Cortical Magnification Factor and Population Receptive Field Size in Human Visual Cortex: Constancies in Cortical Architecture. Journal of Neuroscience, 2011. [10] J. Freeman and E. Simoncelli. Metamers of the Ventral Stream. Nature Neuroscience, 2011. [11] Michael Jenkin and Laurence Harris. Vision and Attention. Springer Sciences & Business Media, 2013. (Book) [12] R. Greger and U. Windhorst. Comprehensive Human Physiology, Vol. 1: From Cellular Mechanisms to Integration. Springer, 1996. (Book) [13] K. Tanaka. Mechanisms of Visual Object Recognition: Monkey and Human Studies. Current Opinion in Neurobiology, 1997. [14] J. Larsson and David Heeger. Two Retinotopic Visual Areas in Human Lateral Occipital Cortex. Journal of Neuroscience, 2006. [15] D. Felleman and D. Van Essen. Distributed Hierarchical Processing in the Primate Cerebral Cortex. Cerebral Cortex, 1991. [16] C.-P. Yu, J. Maxfield, and G. Zelinsky. Searching for Category-Consistent Features: A Computational Approach to Understanding Visual Category Representation. Psychological Science, 2016. [17] C. Cadieu, H. Hong, D. Yamins, N. Pinto, D. Ardila, E. Solomon, N. Majaj, and J. DiCarlo. Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition. PLOS Computational Biology, 2014. This work was supported by NIMH grant R01-MH063746 and NSF grants IIS- 1111047 and IIS- 1161876 to G.J.Z. 2016 Categorical Search & Category-Consistent Features (CCFs) Introduction Deep Convolutional Neural Network Models (DCNNs) Conclusion Hierarchical categories: we collected a 68- category hierarchical dataset, with 48, 16, and 4 subordinate, basic, and superordinate categories, respectively. Dataset: 500 training and 50 validation images per category, for a total of 26,400 images over 48 subordinate categories (basics and superordinates are derived from subordinates) Behavioral Data: Time-to-target (guidance) and subsequent verification times were measured using a categorical search task (N=26). Category-Consistent Features (CCFs): visual features learned by an unsupervised generative model that represent an object category, defined as common features that occur frequently and reliably. SIFT BoW-CCF: previous work used a Bag-of- Word model with SIFT features ([16]) as the base feature pool (1064-d); features with high signal-to- noise ratio (SNR) were identified as the CCFs. This model predicted guidance and verification across hierarchical level, but not at the level of individual categories! Taxi SIFT BoW Taxi SIFT CCFs CNN-CCFs: CNNs learn hierarchical features directly from images. Can the CCFs extracted from CNNs do better in predicting guidance and verification performance? Yes, even at the level of individual object categories! Biological-Plausibility: Most CNNs are learned to predict Classification Accuracy, but they have also been used to predict behavioral and neural data ([17]). We evaluated our VsNet model against biologically-inspired versions of AlexNet and HMAX [4] (Deep-HMAX). 64 7x7,1 BN ReLU 96 3x3,1 BN ReLU 128 3x3,1 BN ReLU 192 3x3,1 BN ReLU Fully Connected Layers Class Labels by-pass connection 128 5x5,2 BN ReLU RF size: 7 px = 1.4º V1 V2 V4s V4c 13 px = 2.6º 17 px = 3.4º 29 px = 5.8º TEOs2a 21 px = 4.2º 256 3x3,1 BN ReLU TEOs1 37 px = 7.4º 192 5x5,1 BN ReLU TEOs2b 61 px = 12.2º 3x3,2 mp 3x3,2 mp 3x3,2 ap 2x2,1 mp 256 2x2,1 BN ReLU TEOc 77 px = 15.4º 192 5x5,1 BN ReLU TE1 77 px = 15.4º 3x3,2 mp 256 3x3,1 BN ReLU TE2 37,93 px = 7.4º,18.6º 3x3,2 mp DepthConcat 640->512 1x1,1 BN ReLU Deep-HMAX (inspired by [3]), ~76m parameters, 1,760 conv units Experiments and Results RF Size Theory: 0.25º ~ 2º @ 5.5º Eccentricity: < 2º V1 21 3x3,2 (0.6º ) 22 5x5,2 (1.0º ) 21 7x7,2 (1.4º ) BN ReLU DepthConcat(64) 0.5º ~ 4º 2º ~ 4º V2 22 3x3,1 (1.4º~ 2.2º ) 32 5x5,1 (2.2º~ 3.0º ) 32 7x7,1 (3.0º~ 3.8º ) BN ReLU DepthConcat(86) 1.5º ~ 8º 4º ~ 6º hV4+LO1+LO2 76 3x3,1 (3.8º~ 6.2º ) 52 5x5,1 (5.4º~ 7.8º ) BN ReLU DepthConcat(128) 3x3,2 mp ~ 7+º > 7º TEO 96 3x3,1 (5.4º~ 9.4º ) 52 5x5,1 (7.0º~ 11º ) BN ReLU DepthConcat(224) 3x3,2 mp 2.5º ~ 70º TE 111 3x3,1 (10.2º~ 15.8º ) 111 5x5,1 (13.4º~ 19.0º ) 111 7x7,1 (16.6º~ 22.2º ) BN ReLU DepthConcat(333) 5x5,4 mp Fully Connected Layers Class Labels Ventral-Stream Network (VsNet), ~72m parameters, 835 conv units 256 3x3,1 BN ReLU 64 11x11,4 BN ReLU 192 5x5,1 BN ReLU 384 3x3,1 BN ReLU 256 3x3,1 BN ReLU Fully Connected Layers Class Labels 3x3,2 mp 3x3,2 mp 3x3,2 mp AlexNet [1][2], ~61m parameters, 1,152 conv units RF size: 11 px = 2.2º 51 px = 10.2º 99 px = 19.8º 131 px = 26.2º 163 px = 32.6º Ventral-Stream Network Design Principles Figure from [4] Weak by-pass Strong by-pass Ventral-Stream area RF sizes: We pooled the human RF-size estimates from [4,5,6,7,8,9,10,11] for each area, based on 1º = 5 pixels. Variable RF sizes within an area: Given that neurons within each v-s area range in RF size, we created parallel convolutional kernels with different sizes within each v-s area (conv layer). The number of kernels for each of these parallel populations depend on the correspondence of the kernel’s RF size and the estimated RF size: more similar receives more shares of units. By-pass connections: V4 receives weak (foveal) bypass inputs from V1 [12], while TEO and TE receive strong bypass inputs from V2 and V4, respectively [13]. # of neurons (conv kernels) of each area: In biological visual systems, neurons of the same selectivity are tiled across the visual field much like a convolution operation with a single filter, especially in the early layers. Therefore, the # of kernels per conv layer is estimated by surface areas [14,15], such that: #_conv_kernels _ _ , _ = log( _ _ ) , visual_area = 1000. hV4+LO1+LO2: Literature suggests that object shape & boundary are also processed by LO1,2 [14]. We therefore combine them with v4. Model VsNet Deep-HMAX* AlexNet Spearman ρ p-val Layers ρ p-val Layers ρ p-val Layers All 68 -.3189 .008 2 -.2089 .0873 2 -.0364 .7679 5 Sub 48 -.3674 .0102 2 -.1862 .2052 2 -.0317 .8306 1 Basic 16 -.2336 .3838 1 -.2492 .352 1 n/a n/a n/a Super 4 -.6325 .5 1 -.4472 .6667 1 -.8 .3333 2 1. Pre-train the networks using ImageNet for classification. 2. Fine-tune the networks with our dataset, using multi-task learning: CNN-CCFs: For each object category, extract the neurons that are highly and reliably activated by its exemplars. (Thresholded Signal-to-noise Ratio). More CCFs = higher specificity = faster time-to-target [16]. Pre-Trained Network 48 Subordinate Labels 16 Basic Labels 4 Superordinate Labels Weight: 0.7 Weight: 0.24 Weight: 0.06 Imagenet VsNet Deep-HMAX AlexNet Top1 Acc 62.73% 60.01% 57.26% Top5 Acc 84.66% 82.6% 80.61% Model VsNet Deep-HMAX* AlexNet Spearman ρ p-val Layers ρ p-val Layers ρ p-val Layers All 68 .3856 .0012 5 .2015 .0994 5 .3407 .0045 4 Sub 48 .4595 .001 5 .2672 .0664 5 .3593 .0121 4 Basic 16 .7251 .002 3,5 .5739 .0201 5 .6937 .0029 4 Super 4 1.0 .083 1,5 .9487 .1667 1 1.0 .083 3 * Since Deep-HMAX is “deeper” than the other two models, only the complex-cell layers are used for CCF extraction. Insignificant correlations (p > 0.05) are colored in red. Negative Correlation Result: #CCF vs Time-to-target Positive Correlation Result : #CCF vs Time-to-target 68-Cat VsNet Deep-HMAX AlexNet Top1 Acc Sub 48 93.79% 92.96% 92.92% Basic 16 96.54% 96.33% 96.25% Super 4 98.92% 99.00% 99.17% ImageNet validation set accuracies Fine-tuning validation set accuracies Results show that more biologically-inspired architectures significantly outperform AlexNet. Multi-Task fine-tuning was performed successfully, all three levels achieved near perfect and human-comparable classification accuracies. AlexNet Layer 1 feature visualization Deep-HMAX Layer 1 feature visualization VsNet Layer 1 feature visualization Brain-inspired networks are better at object category classification. They also better predict guidance of attention to targets compared to AlexNet. Our VsNet is the first deep learning model that is brain-inspired. Future DCNNs should continue to exploit neural constraints so as to better predict behavior.

Important Changes to MySBmail, MySBfiles and Sparky Taking

Embed Size (px)

Citation preview

Page 1: Important Changes to MySBmail, MySBfiles and Sparky Taking

QUICK TIPS

(--THIS SECTION DOES NOT PRINT--) This PowerPoint template requires basic PowerPoint

(version 2007 or newer) skills. Below is a list of commonly

asked questions specific to this template.

If you are using an older version of PowerPoint some

template features may not work properly.

Using the template

Verifying the quality of your graphics

Go to the VIEW menu and click on ZOOM to set your

preferred magnification. This template is at 50% the size of

the final poster. All text and graphics will be printed at

200% their size. To see what your poster will look like when

printed, set the zoom to 200% and evaluate the quality of

all your graphics and photos before you submit your poster

for printing.

Using the placeholders

To add text to this template click inside a placeholder and

type in or paste your text. To move a placeholder, click on

it once (to select it), place your cursor on its frame and

your cursor will change to this symbol: Then, click

once and drag it to its new location where you can resize it

as needed. Additional placeholders can be found on the left

side of this template.

Modifying the layout

This template has four different

column layouts. Right-click your

mouse on the background and

click on “Layout” to see the layout

options. The columns in the provided layouts are fixed and

cannot be moved but advanced users can modify any layout

by going to VIEW and then SLIDE MASTER.

Importing text and graphics from external sources

TEXT: Paste or type your text into a pre-existing

placeholder or drag in a new placeholder from the left side

of the template. Move it anywhere as needed.

PHOTOS: Drag in a picture placeholder, size it first, click in

it and insert a photo from the menu.

TABLES: You can copy and paste a table from an external

document onto this poster template. To adjust the way the

text fits within the cells of a table that has been pasted,

right-click on the table, click FORMAT SHAPE then click on

TEXT BOX and change the INTERNAL MARGIN values to 0.25

Modifying the color scheme

To change the color scheme of this template go to the

“Design” menu and click on “Colors”. You can choose from

the provide color combinations or you can create your own.

QUICK DESIGN GUIDE

(--THIS SECTION DOES NOT PRINT--)

This PowerPoint 2007 template produces a 42”x72”

professional poster. It will save you valuable time placing

titles, subtitles, text, and graphics.

Use it to create your presentation. Then send it to

PosterPresentations.com for premium quality, same day

affordable printing.

We provide a series of online tutorials that will guide you

through the poster design process and answer your poster

production questions.

View our online tutorials at:

http://bit.ly/Poster_creation_help

(copy and paste the link into your web browser).

For assistance and to order your printed poster call

PosterPresentations.com at 1.866.649.3004

Object Placeholders

Use the placeholders provided below to add new elements

to your poster: Drag a placeholder onto the poster area,

size it, and click it to edit.

Section Header placeholder

Move this preformatted section header placeholder to the

poster area to add another section header. Use section

headers to separate topics or concepts within your

presentation.

Text placeholder

Move this preformatted text placeholder to the poster to

add a new body of text.

Picture placeholder

Move this graphic placeholder onto your poster, size it first,

and then click it to add a picture to the poster.

© 2011 PosterPresentations.com 2117 Fourth Street , Unit C

Berkeley CA 94710

[email protected] Student discounts are available on our Facebook page.

Go to PosterPresentations.com and click on the FB

icon.

Question: The search for object categories, categorical search, is

characterized by a subordinate-level advantage in target guidance (e.g., time

to target fixation) and a basic-level advantage in target verification. What are

the underlying features that make these effects possible?

Exp #1: Previous work ([16]) modeled categorical guidance and verification

over three levels of a hierarchy (subordinate, basic, and superordinate) using

category-consistent features built from SIFT and Bag-of-Words. How would

fixation prediction using these biologically-uninspired features compare to

those learned by a CNN of the ventral visual stream?

Exp #2: How do fixation predictions using our Ventral-Stream Network

(VsNet) model compare to AlexNet and Deep-HMAX DCCN models?

Generating the Features for Category Representation using a Deep Convolutional Neural Network

1Dept of Computer Science, 2Dept of Psychology, Stony Brook University

Chen-Ping Yu1([email protected]), Justin Maxfield2, and Gregory J. Zelinsky1,2

References & Acknowledgments [1] A. Krizhevsky, I. Sutskever, and G. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. NIPS, 2012.

[2] A. Krizhevsky. One Weird Trick for Parallelizing Convolutional Neural Networks. Arxiv, 2014.

[3] T. Serre, A. Oliva, and T. Poggio. A Feedforward Architecture Accounts for Rapid Categorization. PNAS, 2007.

[4] D. Kravitz, K. Saleem, C. Baker, L. Ungerleider, and M. Mishkin. The Ventral Visual PathWay: An Expanded Neural Framework for the Processing of Object Quality.

Trends in Cognitive Sciences, 2013.

[5] D. Boussaoud, R. Desimone, and L. Ungerleider. Visual Topography of Area TEO in the Macaque. Journal of Comparative Neurology, 1991.

[6] S. Kastner, P. Weerd, M. Pinsk, M. Elizondo, R. Desimone, and L. Ungerleider. Modulation of Sensory Suppression: Implications for Receptive Field Sizes in the Human

Visual Cortex. Journal of Neurophysiology, 2001.

[7] A. Smith, A. Williams, and M. Greenlee. Estimating Receptive Field Size from fMRI Data in Human Striate and Extrastriate Visual Cortex. Cerebral Cortex, 2001.

[8] G. Rousselet, S. Thorpe, and M. Fabre-Thorpe. How Parallel is Visual Processing in the Ventral Pathway? Trends in Cognitive Sciences, 2004.

[9] B. Harvey and S. Dumoulin. The Relationship between Cortical Magnification Factor and Population Receptive Field Size in Human Visual Cortex: Constancies in

Cortical Architecture. Journal of Neuroscience, 2011.

[10] J. Freeman and E. Simoncelli. Metamers of the Ventral Stream. Nature Neuroscience, 2011.

[11] Michael Jenkin and Laurence Harris. Vision and Attention. Springer Sciences & Business Media, 2013. (Book)

[12] R. Greger and U. Windhorst. Comprehensive Human Physiology, Vol. 1: From Cellular Mechanisms to Integration. Springer, 1996. (Book)

[13] K. Tanaka. Mechanisms of Visual Object Recognition: Monkey and Human Studies. Current Opinion in Neurobiology, 1997.

[14] J. Larsson and David Heeger. Two Retinotopic Visual Areas in Human Lateral Occipital Cortex. Journal of Neuroscience, 2006.

[15] D. Felleman and D. Van Essen. Distributed Hierarchical Processing in the Primate Cerebral Cortex. Cerebral Cortex, 1991.

[16] C.-P. Yu, J. Maxfield, and G. Zelinsky. Searching for Category-Consistent Features: A Computational Approach to Understanding Visual Category Representation.

Psychological Science, 2016.

[17] C. Cadieu, H. Hong, D. Yamins, N. Pinto, D. Ardila, E. Solomon, N. Majaj, and J. DiCarlo. Deep Neural Networks Rival the Representation of Primate IT Cortex for Core

Visual Object Recognition. PLOS Computational Biology, 2014.

This work was

supported by NIMH

grant R01-MH063746

and NSF grants IIS-

1111047 and IIS-

1161876 to G.J.Z.

2016

Categorical Search & Category-Consistent Features (CCFs)

Introduction Deep Convolutional Neural Network Models (DCNNs)

Conclusion

Hierarchical categories: we collected a 68-

category hierarchical dataset, with 48, 16, and 4

subordinate, basic, and superordinate

categories, respectively.

Dataset: 500 training and 50 validation images

per category, for a total of 26,400 images over

48 subordinate categories (basics and

superordinates are derived from subordinates)

Behavioral Data: Time-to-target (guidance) and

subsequent verification times were measured

using a categorical search task (N=26).

Category-Consistent Features (CCFs): visual features learned by an

unsupervised generative model that represent an object category, defined as

common features that occur frequently and reliably.

SIFT BoW-CCF: previous work used a Bag-of-

Word model with SIFT features ([16]) as the base

feature pool (1064-d); features with high signal-to-

noise ratio (SNR) were identified as the CCFs.

This model predicted guidance and verification

across hierarchical level, but not at the level

of individual categories!

Taxi SIFT BoW

Taxi SIFT CCFs

CNN-CCFs: CNNs learn hierarchical features directly from images. Can the CCFs extracted from CNNs do better in predicting guidance

and verification performance? Yes, even at the level of individual object categories!

Biological-Plausibility: Most CNNs are learned to predict Classification Accuracy, but they have also been used to predict behavioral

and neural data ([17]). We evaluated our VsNet model against biologically-inspired versions of AlexNet and HMAX [4] (Deep-HMAX).

64 7x7,1

BN ReLU

96 3x3,1

BN ReLU

128 3x3,1

BN ReLU

192 3x3,1

BN ReLU

Fully Connected

Layers

Cla

ss L

abe

ls

by-pass connection

128 5x5,2

BN ReLU

RF size: 7 px = 1.4º

V1 V2 V4s V4c

13 px = 2.6º 17 px = 3.4º 29 px = 5.8º

TEOs2a

21 px = 4.2º

256 3x3,1

BN ReLU

TEOs1

37 px = 7.4º

192 5x5,1

BN ReLU

TEOs2b

61 px = 12.2º

3x3,2 mp 3x3,2 mp

3x3,2 ap 2x2,1 mp

256 2x2,1

BN ReLU

TEOc

77 px = 15.4º

192 5x5,1

BN ReLU

TE1

77 px = 15.4º

3x3,2 mp

256 3x3,1

BN ReLU

TE2

37,93 px = 7.4º,18.6º

3x3,2 mp

DepthConcat

640->512

1x1,1

BN

ReLU

Deep-HMAX (inspired by [3]), ~76m parameters, 1,760 conv units

Experiments and Results

RF Size Theory: 0.25º ~ 2º

@ 5.5º Eccentricity: < 2º

V1

21 3x3,2 (0.6º)

22 5x5,2 (1.0º)

21 7x7,2 (1.4º)

BN

ReLU

DepthConcat(64)

0.5º ~ 4º

2º ~ 4º

V2

22 3x3,1 (1.4º~ 2.2º)

32 5x5,1 (2.2º~ 3.0º)

32 7x7,1 (3.0º~ 3.8º)

BN

ReLU

DepthConcat(86)

1.5º ~ 8º

4º ~ 6º

hV4+LO1+LO2

76 3x3,1 (3.8º~ 6.2º)

52 5x5,1 (5.4º~ 7.8º)

BN

ReLU

DepthConcat(128)

3x3,2 mp

~ 7+º

> 7º

TEO

96 3x3,1 (5.4º~ 9.4º)

52 5x5,1 (7.0º~ 11º)

BN

ReLU

DepthConcat(224)

3x3,2 mp

2.5º ~ 70º

TE

111 3x3,1 (10.2º~ 15.8º)

111 5x5,1 (13.4º~ 19.0º)

111 7x7,1 (16.6º~ 22.2º)

BN

ReLU

DepthConcat(333)

5x5,4 mp Fully Connected

Layers

Cla

ss L

abe

ls

Ventral-Stream Network (VsNet), ~72m parameters, 835 conv units

256 3x3,1

BN ReLU

64 11x11,4

BN ReLU

192 5x5,1

BN ReLU

384 3x3,1

BN ReLU

256 3x3,1

BN ReLU

Fully Connected

Layers

Cla

ss L

abe

ls

3x3,2 mp 3x3,2 mp 3x3,2 mp

AlexNet [1][2], ~61m parameters, 1,152 conv units

RF size: 11 px = 2.2º 51 px = 10.2º 99 px = 19.8º 131 px = 26.2º 163 px = 32.6º

Ventral-Stream Network Design Principles

Figure from [4]

Weak by-pass

Strong by-pass

Ventral-Stream area RF sizes: We pooled the human RF-size estimates from [4,5,6,7,8,9,10,11] for each area, based on 1º = 5 pixels.

Variable RF sizes within an area: Given that neurons within each v-s area range in RF size, we created parallel convolutional kernels

with different sizes within each v-s area (conv layer). The number of kernels for each of these parallel populations depend on the

correspondence of the kernel’s RF size and the estimated RF size: more similar receives more shares of units.

By-pass connections: V4 receives weak (foveal) bypass inputs from V1 [12], while TEO and TE receive strong bypass inputs from V2

and V4, respectively [13].

# of neurons (conv kernels) of each area: In biological visual systems, neurons of the same selectivity are tiled across the visual field

much like a convolution operation with a single filter, especially in the early layers. Therefore, the # of kernels per conv layer is estimated

by surface areas [14,15], such that:

#_conv_kernels ∝𝑺𝒖𝒓𝒇𝒂𝒄𝒆_𝑨𝒓𝒆𝒂

𝑫𝒖𝒑𝒍𝒊𝒄𝒂𝒕𝒐𝒏_𝑭𝒂𝒄𝒕𝒐𝒓 , 𝑫𝒖𝒑𝒍𝒊𝒄𝒂𝒕𝒊𝒐𝒏_𝑭𝒂𝒄𝒕𝒐𝒓 = log(

𝑽𝒊𝒔𝒖𝒂𝒍_𝑨𝒓𝒆𝒂

𝑹𝑭_𝑺𝒊𝒛𝒆𝟐 ) , visual_area = 1000.

hV4+LO1+LO2: Literature suggests that object shape & boundary are also processed by LO1,2 [14]. We therefore combine them with v4.

Model VsNet Deep-HMAX* AlexNet

Spearman ρ p-val Layers ρ p-val Layers ρ p-val Layers

All 68 -.3189 .008 2 -.2089 .0873 2 -.0364 .7679 5

Sub 48 -.3674 .0102 2 -.1862 .2052 2 -.0317 .8306 1

Basic 16 -.2336 .3838 1 -.2492 .352 1 n/a n/a n/a

Super 4 -.6325 .5 1 -.4472 .6667 1 -.8 .3333 2

1. Pre-train the networks using ImageNet for classification.

2. Fine-tune the networks with our dataset, using multi-task learning:

CNN-CCFs: For each object category, extract the neurons that are highly and

reliably activated by its exemplars. (Thresholded Signal-to-noise Ratio).

More CCFs = higher specificity = faster time-to-target [16].

Pre-Trained Network

48 Subordinate Labels

16 Basic Labels

4 Superordinate Labels

Weight: 0.7

Weight: 0.24

Weight: 0.06

Imagenet VsNet Deep-HMAX AlexNet

Top1 Acc 62.73% 60.01% 57.26%

Top5 Acc 84.66% 82.6% 80.61%

Model VsNet Deep-HMAX* AlexNet

Spearman ρ p-val Layers ρ p-val Layers ρ p-val Layers

All 68 .3856 .0012 5 .2015 .0994 5 .3407 .0045 4

Sub 48 .4595 .001 5 .2672 .0664 5 .3593 .0121 4

Basic 16 .7251 .002 3,5 .5739 .0201 5 .6937 .0029 4

Super 4 1.0 .083 1,5 .9487 .1667 1 1.0 .083 3

* Since Deep-HMAX is “deeper” than the other two models, only the complex-cell layers are used for CCF extraction.

Insignificant correlations (p > 0.05) are colored in red.

Negative Correlation Result: #CCF vs Time-to-target

Positive Correlation Result : #CCF vs Time-to-target

68-Cat VsNet Deep-HMAX AlexNet

Top1 Acc

Sub 48 93.79% 92.96% 92.92%

Basic 16 96.54% 96.33% 96.25%

Super 4 98.92% 99.00% 99.17%

ImageNet validation set accuracies Fine-tuning validation set accuracies

Results show that more biologically-inspired architectures

significantly outperform AlexNet. Multi-Task fine-tuning was performed successfully, all three levels

achieved near perfect and human-comparable classification accuracies.

AlexNet Layer 1 feature visualization

Deep-HMAX Layer 1 feature visualization

VsNet Layer 1 feature visualization

Brain-inspired networks are better at object category classification.

They also better predict guidance of attention to targets compared to AlexNet.

Our VsNet is the first deep learning model that is brain-inspired. Future DCNNs

should continue to exploit neural constraints so as to better predict behavior.