Upload
vantuong
View
215
Download
0
Embed Size (px)
Citation preview
QUICK TIPS
(--THIS SECTION DOES NOT PRINT--) This PowerPoint template requires basic PowerPoint
(version 2007 or newer) skills. Below is a list of commonly
asked questions specific to this template.
If you are using an older version of PowerPoint some
template features may not work properly.
Using the template
Verifying the quality of your graphics
Go to the VIEW menu and click on ZOOM to set your
preferred magnification. This template is at 50% the size of
the final poster. All text and graphics will be printed at
200% their size. To see what your poster will look like when
printed, set the zoom to 200% and evaluate the quality of
all your graphics and photos before you submit your poster
for printing.
Using the placeholders
To add text to this template click inside a placeholder and
type in or paste your text. To move a placeholder, click on
it once (to select it), place your cursor on its frame and
your cursor will change to this symbol: Then, click
once and drag it to its new location where you can resize it
as needed. Additional placeholders can be found on the left
side of this template.
Modifying the layout
This template has four different
column layouts. Right-click your
mouse on the background and
click on “Layout” to see the layout
options. The columns in the provided layouts are fixed and
cannot be moved but advanced users can modify any layout
by going to VIEW and then SLIDE MASTER.
Importing text and graphics from external sources
TEXT: Paste or type your text into a pre-existing
placeholder or drag in a new placeholder from the left side
of the template. Move it anywhere as needed.
PHOTOS: Drag in a picture placeholder, size it first, click in
it and insert a photo from the menu.
TABLES: You can copy and paste a table from an external
document onto this poster template. To adjust the way the
text fits within the cells of a table that has been pasted,
right-click on the table, click FORMAT SHAPE then click on
TEXT BOX and change the INTERNAL MARGIN values to 0.25
Modifying the color scheme
To change the color scheme of this template go to the
“Design” menu and click on “Colors”. You can choose from
the provide color combinations or you can create your own.
QUICK DESIGN GUIDE
(--THIS SECTION DOES NOT PRINT--)
This PowerPoint 2007 template produces a 42”x72”
professional poster. It will save you valuable time placing
titles, subtitles, text, and graphics.
Use it to create your presentation. Then send it to
PosterPresentations.com for premium quality, same day
affordable printing.
We provide a series of online tutorials that will guide you
through the poster design process and answer your poster
production questions.
View our online tutorials at:
http://bit.ly/Poster_creation_help
(copy and paste the link into your web browser).
For assistance and to order your printed poster call
PosterPresentations.com at 1.866.649.3004
Object Placeholders
Use the placeholders provided below to add new elements
to your poster: Drag a placeholder onto the poster area,
size it, and click it to edit.
Section Header placeholder
Move this preformatted section header placeholder to the
poster area to add another section header. Use section
headers to separate topics or concepts within your
presentation.
Text placeholder
Move this preformatted text placeholder to the poster to
add a new body of text.
Picture placeholder
Move this graphic placeholder onto your poster, size it first,
and then click it to add a picture to the poster.
© 2011 PosterPresentations.com 2117 Fourth Street , Unit C
Berkeley CA 94710
[email protected] Student discounts are available on our Facebook page.
Go to PosterPresentations.com and click on the FB
icon.
Question: The search for object categories, categorical search, is
characterized by a subordinate-level advantage in target guidance (e.g., time
to target fixation) and a basic-level advantage in target verification. What are
the underlying features that make these effects possible?
Exp #1: Previous work ([16]) modeled categorical guidance and verification
over three levels of a hierarchy (subordinate, basic, and superordinate) using
category-consistent features built from SIFT and Bag-of-Words. How would
fixation prediction using these biologically-uninspired features compare to
those learned by a CNN of the ventral visual stream?
Exp #2: How do fixation predictions using our Ventral-Stream Network
(VsNet) model compare to AlexNet and Deep-HMAX DCCN models?
Generating the Features for Category Representation using a Deep Convolutional Neural Network
1Dept of Computer Science, 2Dept of Psychology, Stony Brook University
Chen-Ping Yu1([email protected]), Justin Maxfield2, and Gregory J. Zelinsky1,2
References & Acknowledgments [1] A. Krizhevsky, I. Sutskever, and G. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. NIPS, 2012.
[2] A. Krizhevsky. One Weird Trick for Parallelizing Convolutional Neural Networks. Arxiv, 2014.
[3] T. Serre, A. Oliva, and T. Poggio. A Feedforward Architecture Accounts for Rapid Categorization. PNAS, 2007.
[4] D. Kravitz, K. Saleem, C. Baker, L. Ungerleider, and M. Mishkin. The Ventral Visual PathWay: An Expanded Neural Framework for the Processing of Object Quality.
Trends in Cognitive Sciences, 2013.
[5] D. Boussaoud, R. Desimone, and L. Ungerleider. Visual Topography of Area TEO in the Macaque. Journal of Comparative Neurology, 1991.
[6] S. Kastner, P. Weerd, M. Pinsk, M. Elizondo, R. Desimone, and L. Ungerleider. Modulation of Sensory Suppression: Implications for Receptive Field Sizes in the Human
Visual Cortex. Journal of Neurophysiology, 2001.
[7] A. Smith, A. Williams, and M. Greenlee. Estimating Receptive Field Size from fMRI Data in Human Striate and Extrastriate Visual Cortex. Cerebral Cortex, 2001.
[8] G. Rousselet, S. Thorpe, and M. Fabre-Thorpe. How Parallel is Visual Processing in the Ventral Pathway? Trends in Cognitive Sciences, 2004.
[9] B. Harvey and S. Dumoulin. The Relationship between Cortical Magnification Factor and Population Receptive Field Size in Human Visual Cortex: Constancies in
Cortical Architecture. Journal of Neuroscience, 2011.
[10] J. Freeman and E. Simoncelli. Metamers of the Ventral Stream. Nature Neuroscience, 2011.
[11] Michael Jenkin and Laurence Harris. Vision and Attention. Springer Sciences & Business Media, 2013. (Book)
[12] R. Greger and U. Windhorst. Comprehensive Human Physiology, Vol. 1: From Cellular Mechanisms to Integration. Springer, 1996. (Book)
[13] K. Tanaka. Mechanisms of Visual Object Recognition: Monkey and Human Studies. Current Opinion in Neurobiology, 1997.
[14] J. Larsson and David Heeger. Two Retinotopic Visual Areas in Human Lateral Occipital Cortex. Journal of Neuroscience, 2006.
[15] D. Felleman and D. Van Essen. Distributed Hierarchical Processing in the Primate Cerebral Cortex. Cerebral Cortex, 1991.
[16] C.-P. Yu, J. Maxfield, and G. Zelinsky. Searching for Category-Consistent Features: A Computational Approach to Understanding Visual Category Representation.
Psychological Science, 2016.
[17] C. Cadieu, H. Hong, D. Yamins, N. Pinto, D. Ardila, E. Solomon, N. Majaj, and J. DiCarlo. Deep Neural Networks Rival the Representation of Primate IT Cortex for Core
Visual Object Recognition. PLOS Computational Biology, 2014.
This work was
supported by NIMH
grant R01-MH063746
and NSF grants IIS-
1111047 and IIS-
1161876 to G.J.Z.
2016
Categorical Search & Category-Consistent Features (CCFs)
Introduction Deep Convolutional Neural Network Models (DCNNs)
Conclusion
Hierarchical categories: we collected a 68-
category hierarchical dataset, with 48, 16, and 4
subordinate, basic, and superordinate
categories, respectively.
Dataset: 500 training and 50 validation images
per category, for a total of 26,400 images over
48 subordinate categories (basics and
superordinates are derived from subordinates)
Behavioral Data: Time-to-target (guidance) and
subsequent verification times were measured
using a categorical search task (N=26).
Category-Consistent Features (CCFs): visual features learned by an
unsupervised generative model that represent an object category, defined as
common features that occur frequently and reliably.
SIFT BoW-CCF: previous work used a Bag-of-
Word model with SIFT features ([16]) as the base
feature pool (1064-d); features with high signal-to-
noise ratio (SNR) were identified as the CCFs.
This model predicted guidance and verification
across hierarchical level, but not at the level
of individual categories!
Taxi SIFT BoW
Taxi SIFT CCFs
CNN-CCFs: CNNs learn hierarchical features directly from images. Can the CCFs extracted from CNNs do better in predicting guidance
and verification performance? Yes, even at the level of individual object categories!
Biological-Plausibility: Most CNNs are learned to predict Classification Accuracy, but they have also been used to predict behavioral
and neural data ([17]). We evaluated our VsNet model against biologically-inspired versions of AlexNet and HMAX [4] (Deep-HMAX).
64 7x7,1
BN ReLU
96 3x3,1
BN ReLU
128 3x3,1
BN ReLU
192 3x3,1
BN ReLU
Fully Connected
Layers
Cla
ss L
abe
ls
by-pass connection
128 5x5,2
BN ReLU
RF size: 7 px = 1.4º
V1 V2 V4s V4c
13 px = 2.6º 17 px = 3.4º 29 px = 5.8º
TEOs2a
21 px = 4.2º
256 3x3,1
BN ReLU
TEOs1
37 px = 7.4º
192 5x5,1
BN ReLU
TEOs2b
61 px = 12.2º
3x3,2 mp 3x3,2 mp
3x3,2 ap 2x2,1 mp
256 2x2,1
BN ReLU
TEOc
77 px = 15.4º
192 5x5,1
BN ReLU
TE1
77 px = 15.4º
3x3,2 mp
256 3x3,1
BN ReLU
TE2
37,93 px = 7.4º,18.6º
3x3,2 mp
DepthConcat
640->512
1x1,1
BN
ReLU
Deep-HMAX (inspired by [3]), ~76m parameters, 1,760 conv units
Experiments and Results
RF Size Theory: 0.25º ~ 2º
@ 5.5º Eccentricity: < 2º
V1
21 3x3,2 (0.6º)
22 5x5,2 (1.0º)
21 7x7,2 (1.4º)
BN
ReLU
DepthConcat(64)
0.5º ~ 4º
2º ~ 4º
V2
22 3x3,1 (1.4º~ 2.2º)
32 5x5,1 (2.2º~ 3.0º)
32 7x7,1 (3.0º~ 3.8º)
BN
ReLU
DepthConcat(86)
1.5º ~ 8º
4º ~ 6º
hV4+LO1+LO2
76 3x3,1 (3.8º~ 6.2º)
52 5x5,1 (5.4º~ 7.8º)
BN
ReLU
DepthConcat(128)
3x3,2 mp
~ 7+º
> 7º
TEO
96 3x3,1 (5.4º~ 9.4º)
52 5x5,1 (7.0º~ 11º)
BN
ReLU
DepthConcat(224)
3x3,2 mp
2.5º ~ 70º
?º
TE
111 3x3,1 (10.2º~ 15.8º)
111 5x5,1 (13.4º~ 19.0º)
111 7x7,1 (16.6º~ 22.2º)
BN
ReLU
DepthConcat(333)
5x5,4 mp Fully Connected
Layers
Cla
ss L
abe
ls
Ventral-Stream Network (VsNet), ~72m parameters, 835 conv units
256 3x3,1
BN ReLU
64 11x11,4
BN ReLU
192 5x5,1
BN ReLU
384 3x3,1
BN ReLU
256 3x3,1
BN ReLU
Fully Connected
Layers
Cla
ss L
abe
ls
3x3,2 mp 3x3,2 mp 3x3,2 mp
AlexNet [1][2], ~61m parameters, 1,152 conv units
RF size: 11 px = 2.2º 51 px = 10.2º 99 px = 19.8º 131 px = 26.2º 163 px = 32.6º
Ventral-Stream Network Design Principles
Figure from [4]
Weak by-pass
Strong by-pass
Ventral-Stream area RF sizes: We pooled the human RF-size estimates from [4,5,6,7,8,9,10,11] for each area, based on 1º = 5 pixels.
Variable RF sizes within an area: Given that neurons within each v-s area range in RF size, we created parallel convolutional kernels
with different sizes within each v-s area (conv layer). The number of kernels for each of these parallel populations depend on the
correspondence of the kernel’s RF size and the estimated RF size: more similar receives more shares of units.
By-pass connections: V4 receives weak (foveal) bypass inputs from V1 [12], while TEO and TE receive strong bypass inputs from V2
and V4, respectively [13].
# of neurons (conv kernels) of each area: In biological visual systems, neurons of the same selectivity are tiled across the visual field
much like a convolution operation with a single filter, especially in the early layers. Therefore, the # of kernels per conv layer is estimated
by surface areas [14,15], such that:
#_conv_kernels ∝𝑺𝒖𝒓𝒇𝒂𝒄𝒆_𝑨𝒓𝒆𝒂
𝑫𝒖𝒑𝒍𝒊𝒄𝒂𝒕𝒐𝒏_𝑭𝒂𝒄𝒕𝒐𝒓 , 𝑫𝒖𝒑𝒍𝒊𝒄𝒂𝒕𝒊𝒐𝒏_𝑭𝒂𝒄𝒕𝒐𝒓 = log(
𝑽𝒊𝒔𝒖𝒂𝒍_𝑨𝒓𝒆𝒂
𝑹𝑭_𝑺𝒊𝒛𝒆𝟐 ) , visual_area = 1000.
hV4+LO1+LO2: Literature suggests that object shape & boundary are also processed by LO1,2 [14]. We therefore combine them with v4.
Model VsNet Deep-HMAX* AlexNet
Spearman ρ p-val Layers ρ p-val Layers ρ p-val Layers
All 68 -.3189 .008 2 -.2089 .0873 2 -.0364 .7679 5
Sub 48 -.3674 .0102 2 -.1862 .2052 2 -.0317 .8306 1
Basic 16 -.2336 .3838 1 -.2492 .352 1 n/a n/a n/a
Super 4 -.6325 .5 1 -.4472 .6667 1 -.8 .3333 2
1. Pre-train the networks using ImageNet for classification.
2. Fine-tune the networks with our dataset, using multi-task learning:
CNN-CCFs: For each object category, extract the neurons that are highly and
reliably activated by its exemplars. (Thresholded Signal-to-noise Ratio).
More CCFs = higher specificity = faster time-to-target [16].
Pre-Trained Network
48 Subordinate Labels
16 Basic Labels
4 Superordinate Labels
Weight: 0.7
Weight: 0.24
Weight: 0.06
Imagenet VsNet Deep-HMAX AlexNet
Top1 Acc 62.73% 60.01% 57.26%
Top5 Acc 84.66% 82.6% 80.61%
Model VsNet Deep-HMAX* AlexNet
Spearman ρ p-val Layers ρ p-val Layers ρ p-val Layers
All 68 .3856 .0012 5 .2015 .0994 5 .3407 .0045 4
Sub 48 .4595 .001 5 .2672 .0664 5 .3593 .0121 4
Basic 16 .7251 .002 3,5 .5739 .0201 5 .6937 .0029 4
Super 4 1.0 .083 1,5 .9487 .1667 1 1.0 .083 3
* Since Deep-HMAX is “deeper” than the other two models, only the complex-cell layers are used for CCF extraction.
Insignificant correlations (p > 0.05) are colored in red.
Negative Correlation Result: #CCF vs Time-to-target
Positive Correlation Result : #CCF vs Time-to-target
68-Cat VsNet Deep-HMAX AlexNet
Top1 Acc
Sub 48 93.79% 92.96% 92.92%
Basic 16 96.54% 96.33% 96.25%
Super 4 98.92% 99.00% 99.17%
ImageNet validation set accuracies Fine-tuning validation set accuracies
Results show that more biologically-inspired architectures
significantly outperform AlexNet. Multi-Task fine-tuning was performed successfully, all three levels
achieved near perfect and human-comparable classification accuracies.
AlexNet Layer 1 feature visualization
Deep-HMAX Layer 1 feature visualization
VsNet Layer 1 feature visualization
Brain-inspired networks are better at object category classification.
They also better predict guidance of attention to targets compared to AlexNet.
Our VsNet is the first deep learning model that is brain-inspired. Future DCNNs
should continue to exploit neural constraints so as to better predict behavior.