Download pdf - 【arXiv】Feature Evaluation of Deep Convolutional Neural Networks for Object Recognition and Detection

Feature Evaluation of Deep Convolutional Neural Networks for Object Recognition and Detection

Hirokatsu KATAOKA, Kenji Iwata, Yutaka SATOH

National Institute of Advanced Industrial Science and Technology (AIST)

http://www.hirokatsukataoka.net/

arXiv preprint arXiv:1509.07627 http://arxiv.org/abs/1509.07627

Feature Evaluation •  Significant task in computer vision –  Based on the DeCAF [Donahue+, ICML2014], we evaluate several CNN

features + SVM classifier –  The representative architecture: AlexNet [Krizhevsky+, NIPS2012] &

VGGNet[Simonyan+, ICLR2015] –  Basic Idea1: Which layer has better feature in CNN architecture? –  Basic Idea2: Mid- & High-level CNN features should be concatenated! (e.g. Layer 3 + Layer 5 + Layer 7)

CNN Architecture & Feature Extraction •  AlexNet & VGGNet –  AlexNet: 8-layer architecture –  VGGNet: 16-layer arhitecture (each pooling layer and last 2 FC layers are

applied as feature vector)

Input

Conv

Conv

Pool

Conv

Pool

FC

FC

So.max

Input

Conv

Conv

Pool

FC

FC

AlexNet

VGGNet

Conv

Conv

Pool

Conv

Conv

Pool

Conv

Conv

Pool

Conv

Conv

Pool

FC

So.max

Input

Conv

Pool

FC

So.max

: Image input

: Convolu:onal layer

: Max-‐pooling layer

: Fully-‐connected layer

: So.max layer

Layer1

Layer2

Layer3

Layer4

Layer5

Layer6

Layer7

Layer1

Layer2

Layer3

Layer4

Layer5

Layer6

Layer7

Experiment •  Settings –  Layer: 3 – 7 (middle and deeper layers) •  Conv., pooling and fully-connected layers

–  Concatenation and transformation •  Layer 345, 456, 567, 357 •  Principal component analysis (PCA): 1500dims

–  Classifier •  Support vector machine (SVM) •  The parameters are based on DeCAF [Donahue+, ICML2014]

•  Datasets –  Daimler pedestrian benchmark dataset (pedestrian detection) [Munder+,

TPAMI2006] –  Caltech 101 dataset (object classification) [Fei-Fei+, CVPRW2004]

Results on the Daimler dataset •  Daimler pedestrian benchmark dataset –  VGGNet Layer 5 (original vector) is the best rate (99.35%) –  In AlexNet, Layer 3 with PCA is the best rate (98.71%)

Mid-layer is tend to be better rate on the pedestrian detection data

Results on the Caltech 101 dataset •  Caltech 101 dataset –  VGGNet Layer 5 (original vector) is the best rate (91.80%) –  In AlexNet, Layer 5 with PCA is the best rate (78.37%)

The layer before FC layer performs good rate in object classification

Feature Concatenation •  Three-layer connection with PCA –  Layer 345, 456, 567, 357 –  4,500 dimensions (1,500dims at each vector) –  Left: Daimler –  Right: Caltech 101

Daimler Caltech 101

VGGNet layer 567 is the significant tuning Pedestrian detection: mid-level feature Object classification: high-level feature

Conclusion •  Feature evaluation with AlexNet & VGGNet –  VGGNet is better than AlexNet

–  Mid-level feature is good for pedestrian detection, and high-level feature is

good for object classification task

–  Concatenation of VGGNet - 5th Pooling, last 2 FC layers is the best setting on

the Daimler pedestrian benchmark and Caltech 101 dataset

–  PCA is effective transformation for CNN feature