Learning Convolutional Feature Hierarchies for Visual Recognition Koray Kavukcuoglu, Pierre...

Learning Convolutional Feature Hierarchies for

Visual Recognition

Koray Kavukcuoglu, Pierre Sermanet, Y-Lan Boureau,Karol Gregor, Michael Mathieu, Yann LeCun

NIPS 2010

Presented by Bo Chen

Outline

• 1. Drawbacks in the Traditional Convolutional Methods

• 2. The Proposed Algorithm and Some Details• 3. Experimental Results• 4. Conslusions

Convolutional Sparse Coding

Negative:

1. The representations of whole images are highly redundantbecause the training and the inference are performed at the patch level.

2. The inference for a whole image is computationally expensive.

Solutions

• 1. Introducing Convolution Operator

• 2. Introducing Nonlinear Encoder Module

Learning Convolutional Dictionaries• 1. The Boundary Effects Due to Convolutions

Apply a mask on the derivatives of the reconstruction error:

where mask is a term-by-term multiplier that either puts zeros or graduallyscales down the boundaries.

• 2. Computational Efficient Derivative

Learning an Efficient Encoder1. A New Smooth Shrinkage Operator:

2. To aid faster convergence, use stochastic diagonal Levenberg-Marquardt method to calculate a positive diagonal approximation to the hessian.

Patch Based vs Convolutional Sparse Modeling

The convolution operator enables the system to model local structures that appear anywhere in the signal. The convolutional dictionary does not waste resources modeling similar filter structure at multiple locations. Instead, it Models more orientations, frequencies, and different structures including center-surround filters, double center-surround filters, and corner structures at various angles.

Multi-Stage ArchitectureThe convolutional encoder can be used to replace patch-based sparse coding modules used in multistage object recognition architectures. Building on the previous findings, for each stage, the encoder is followed by and absolute value rectification,contrast normalization and average subsampling.

Absolute Value Rectification: a simple pointwise absolute value function applied on the output of the encoder.

Contrast Normalization: reduce the dependencies between components (feature maps). When used in between layers, the mean and standard deviation is calculated across all feature maps with a 9 × 9 neighborhood in spatial dimensions.

Average Pooling: a spatial pooling operation that is applied on each feature map independently.

Experiments 1: Object Recognition Using Caltech 101 Dataset

Preprocess:1. 30/30 training/testing; 2. Resize: 151x143; 3. Local Contrast Normalization

Unsupervised Training: Berkeley segmentation dataset

Architecture:First Layer: 64 9x9; Pooling: 10 × 10 area with 5 pixel stride.Second Layer: 256 9x9, where each dictionary elementis constrained to connect 16 dictionary elements from the first layer; 6 × 6 area with stride 4.

Recognition Accuracy

One Layer

Two Layers

Ours: 65.8% (0.6)

Pedestrian Detection(1)Original dataset: positive=2416; negative=1218Augmented: positive= 11370 (1000); negative=9001(1000)

Layer-1: 32 7x7; Layer-2: 64 7x7; Pooling: 2x2

Pedestrian Detection(2)

Conclusions

• 1. Convolutional training of feature extractors reduces the redundancy among filters compared with those obtained from patch based models.

• 2. Introduced two different convolutional encode functions for performing efficient feature extraction which is crucial for using sparse coding in real world applications.

• 3. The proposed sparse modeling systems has been applied through a successful multi-stage architecture on object recognition and pedestrian detection problems and performed comparably to similar systems.

Learning Convolutional Feature Hierarchies for Visual Recognition Koray Kavukcuoglu, Pierre...

Documents

Proceedings of the Thirty-First AAAI Conference on ...research.baidu.com/Public/uploads/5acc26e57d28a.pdf · Kavukcuoglu 2015) has been widely used in image cap- tion (Xu and Saenko

Unsupervised Perceptual Rewards for Imitation Learning … · Unsupervised Perceptual Rewards for Imitation Learning Pierre Sermanet Kelvin Xuy Sergey Levine sermanet,kelvinxx,slevine@google.com

Scene Understanding Without Labels - University of … · S. M. Ali Eslami, Nicolas Heess, Theophane Weber, Yuval Tassa, Koray Kavukcuoglu, Geoffrey Hinton (2016) Attend, Infer, Repeat

Neural Networks in NLP: The Curse of …A scalable hierarchical distributed language model. NIPS, 2009. [2] Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P. Natural

Recurrent Neural Networks (RNN) and Long-Short-Term … · Last Time: CNN Architectures. ... Sequential Processing of Non-Sequence Data Ba, Mnih, and Kavukcuoglu, ... Recurrent Neural

Video Pixel Networks - arXivVideo Pixel Networks Nal Kalchbrenner, A aron van den Oord, Karen Simonyan Ivo Danihelka, Oriol Vinyals, Alex Graves, Koray Kavukcuoglu fnalk,avdnoord,simonyan,danihelka,vinyals,gravesa,koraykg@google.com

Pierre Sermanet¹·² Raia Hadsell¹ Jan Ben² Ayse Naz Erkan¹ Beat Flepp² Urs Muller² Yann LeCun¹ (1) Courant Institute of Mathematical Sciences, New York

Pixel Recurrent Neural Networks - arXiv · PDF filePixel Recurrent Neural Networks Aaron van den Oord¨ AVDNOORD@ Nal Kalchbrenner NALK@ Koray Kavukcuoglu KORAYK@ Google DeepMind

arXiv:1703.04103v2 [cs.CV] 16 Mar 2017 · 2014, Sharif Razavian et al., 2014, Sermanet et al., 2013]. These state-of-the-art architectures are now ﬁnding their way into a number

arXiv:2006.12442v2 [cs.CL] 13 Jul 2020 · Current Progress, Open Problems, and Future Directions Stephen Roller , Y-Lan Boureau , Jason Weston , Antoine Bordes, ... focusing on work

1 EVALUATION DE LA DOULEUR CHRONIQUE Christian GUY-COICHARD, François BOUREAU Centre d'Evaluation et de Traitement de la Douleur, Hôpital Saint-Antoine

CHRONOLOGY OF THE MARTIN AND GUÉRIN · PDF fileof Pierre, Marie-Anne, ... met Marie-Anne-Fanie Boureau Martin, Louis Martin’s mother. 7 ... Jean Louis Verger, a priest who had

SpatialTransformerNetworks · 2018. 10. 9. · SpatialTransformerNetworks Max Jaderberg Karen Simonyan Andrew Zisserman Koray Kavukcuoglu Google DeepMind, London, UK 15210240008 贺珂珂

Bibliography Manuscript Sources - Columbia University · Written mostly in the year 1700, while the Author was in the American regions. Boureau Deslandes, A. F. Amsterdam: Franéois

Part VI: What’s Next? - hu-berlin.de...Danihelka, Oriol Vinyals, Alex Graves, Koray Kavukcuoglu, Video Pixel Networks , arXiv preprint, 2016 Deep Generative Models Component-by-component

OverFeat: Integrated Recognition, Localization and Detection ...yann.lecun.com/exdb/publis/pdf/sermanet-iclr-14.pdfThis integrated framework is the winner of the localization task

Video Pixel Networks - arXiv.org e-Print archive Pixel Networks Nal Kalchbrenner, A aron van den Oord, Karen Simonyan Ivo Danihelka, Oriol Vinyals, Alex Graves, Koray Kavukcuoglu fnalk,avdnoord,simonyan,danihelka,vinyals,gravesa,koraykg@google.com

Learning Maneuver Dictionaries for Ground Robot Planningyann.lecun.com/exdb/publis/pdf/sermanet-isr-08.pdf · Learning Maneuver Dictionaries for Ground Robot Planning ... Optimal

OverFeat( - Stanford Universityvision.stanford.edu/teaching/cs231b_spring1415/slides/overfeat_eric.pdfOverFeat • Pierre Sermanet • New York University Classiication:CNNsandSliding(Windows(ConvNets

Learning Feature Hierarchies for Object Recognition Feature Hierarchies for Object Recognition by Koray Kavukcuoglu A dissertation submitted in partial fulﬁllment of the requirements