Upload
nlabutokyo
View
419
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
IEEE International Conference on Multimedia & Expo 2013
Augmenting Descriptors for Fine-grained Visual Categorization Using Polynomial Embedding
Hideki Nakayama
Graduate School of Information Science and Technology The University of Tokyo
Outline
Background and Motivation Our solution
Polynomial embedding
Experiment Fine-grained categorization Comparison with state-of-the-art
Conclusion
2
Fine-grained visual categorization (FGVC) Distinguish hundreds of fine-grained objects
under a certain domain (e.g., species of animals and plants)
Complement to traditional object recognition problems
Caltech-256 [Griffin et al., 2007]
Caltech-Bird-200 [Welinder et al., 2010]
Generic Object Recognition FGVC
Yellow Warbler
Pririe
Warbler Pine Warbler Airplane Monitor Dog
V.S.
3
Motivation
We need highly discriminative features to distinguish visually very similar categories
Especially at local level.
4
Two basic ideas
1. Co-occurrence (correlation) of neighboring local descriptors Shaplet [Sabzmeydani et al., 2007] Covariance feature [Tuzel et al., 2006]
GGV [Harada et al., 2012]
Expected to capture middle-level local information Results in high-dimensional local features
2. State-of-the-art bag-of-words representation
Based on higher-order statistics of local features Fisher vector [Perronnin et al., 2010]
VLAD [Jegou et al., 2010]
Remarkably high-performance, enables linear classification Dimensionality increases in linear to the size of local features
☹ ☺
☺ ☹
(N: number of visual words, D: size of local features) ND
2ND
conflict
5
Our approach
Compress polynomials of neighboring local features vector with supervised dimensionality reduction Discriminative latent descriptor
Encode by means of bag-of-words (Fisher vector) Logistic regression classifier
1,000~1,0000 dim 64 dim Descriptor
(e.g. SIFT)
Dense sampling
polynomialvectors
latentdescriptor
categorylabel
CCA
(training)
Fishervector
logisticregressionclassifier
6
★● ●
Exploit co-occurrence information
e.g. SIFT
( )( )( )
=
+
−
Tyxyx
Tyxyx
Tyxyx
yx
yx
Vec
Vec
upperVec
),(),(
),(),(
),(),(
),(
2),(
δ
δ
vv
vv
vv
v
p
),( yxv),( yx σ−v ),( yx σ+vNeighbor
(Left) Neighbor (Right)
Descriptor at target position
×
×
Polynomial Vector
a Matrix of vector flattened:)(Vec
7
Exploit co-occurrence information More spatial information can be integrated with more
neighbors (but become high-dimensional)
( )( )( )
=
+
−
Tyxyx
Tyxyx
Tyxyx
yx
yx
Vec
Vec
upperVec
),(),(
),(),(
),(),(
),(
2),(
δ
δ
vv
vv
vv
v
p
( )
= T
yxyx
yxyx upperVec ),(),(
),(0),( vv
vp
( )( )( )( )( )
=
+
+
−
−
Tyxyx
Tyxyx
Tyxyx
Tyxyx
Tyxyx
yx
yx
Vec
Vec
Vec
Vec
upperVec
),(),(
),(),(
),(),(
),(),(
),(),(
),(
4),(
δ
δ
δ
δ
vv
vv
vv
vv
vv
v
p
★
★● ●
★● ●
●
●
0-neighbor
2-neighbors
4-neighbors
2,144dim
10,336dim
18,528dim
8
Descriptor(e.g. SIFT)
Dense sampling
polynomialvectors
latentdescriptor
Fishervector
logisticregressionclassifier
categorylabel
CCA
(training)
9
Patch feature and label pairs Category label: Binary occurrence vector
Strong supervision assumption
Most patches should be related to the content (category) (Somewhat) justified for FGVC considering the applications Users will target the object, sometimes can give segmentation
Supervised dimensionality reduction
Allium triquetrum
0
010
0
010
0
010
0
010
(We do not perform manual segmentation in this work, though) 10
Canonical Correlation Analysis (CCA) [Hotelling, 1936]
( ) ( ) tslltpps
lp
andbetweenncorrelatiothemaximizethat,tionstransformalinearfinds CCA
featurelabel: ls),(polynomiafeaturepatch:
−=−= TT BA
Supervised dimensionality reduction
( )( )IBCBBCBCCC
IACAACACCC
llT
llplpplp
ppT
pplpllpl
=Λ=
=Λ=−
−
21
21
nscorrelatiocanonical:matricescovariance:
ΛC
p l
Canonical space
s t
s
t
Image feature Labels feature
( )pps −= TALatent descriptor
1,000~1,0000 dim
64 dim
(discriminative)
11
Descriptor(e.g. SIFT)
Dense sampling
polynomialvectors
latentdescriptor
Fishervector
logisticregressionclassifier
categorylabel
CCA
(training)
12
Fisher Vector [Perronnin et al., 2010]
State-of-the-art bag-of-words encoding method using higher-level statistics of descriptors (mean and var)
http://www.image-net.org/challenges/LSVRC/2010/ILSVRC2010_XRCE.pdf
13
Experiments
Experimental setup
FGVC Datasets Oxford-Flowers-102 Caltech-Bird-200
Descriptors SIFT, C-SIFT, Opponent-SIFT, Self Similarity Compressed into 64dim using several methods
Fisher Vector 64 Gaussians (visual wods) Global + 3 horizontal spatial regions
Classifier Logistic regression
Evaluation Mean classification accuracy
Flowers
Birds
15
Results: comparison with PCA and CCA
Our method substantially improves performance for all descriptors
Just applying CCA to concatenated neighbors does not improve performance Polynomial embedding makes sense (non-linear convolution)
0
10
20
30
40
50
60
70
80
90
SIFT C-SIFT Opp.-SIFT SSIM
PCA (baseline)
CCA (4-neighbors)
PolEmb (4-neighbors)
0
5
10
15
20
25
SIFT C-SIFT Opp.-SIFT SSIM
Flower Bird
Classification performance (%) with different embedding methods (all 64dim)
Baseline (PCA)
Ours
16
CCA without
Pol.
Results: number of neighbors
Including more neighbors improves performance
0
10
20
30
40
50
60
70
80
90
SIFT C-SIFT Opp.-SIFT SSIM0
5
10
15
20
25
SIFT C-SIFT Opp.-SIFT SSIM
Classification performance (%) of our method with different number of neighbors
Flower Bird
★
★● ●
★● ●
●
●
17
Comparison with other work
Our final system Combine four descriptors in late-fusion
approach (SIFT, C-SIFT, Opp.-SIFT, SSIM)
Sum of log-likelihoods output by each classifier (weighted by its individual confidence)
Descriptor 1(e.g. SIFT)
Dense sampling
polynomialvectors
categorylabel
CCA
latentdescriptor
Fishervector
logisticregressionclassifier
logisticregressionclassifier
logisticregressionclassifier
(training)
Descriptor 2
Descriptor K
+
・・・・・・
Same as above
Same as above
Alliumtriquetrum
19
Comparison on FGVC datasets
Our method outperforms previous work on bird and flower datasets
For the bird dataset, [32] uses the bounding box only for training images, therefore the result is not directly comparable to ours.
(PCA) (PolEmb) (PCA+PolEmb)
← baseline
Mean classification accuracy (%)
20
ImageCLEF 2013 Plant Identification
Flower FruitLeaf Stem Entire
KakiPersimmon
Silverbirch
Boxeldermapple
Identify 250 plant species from different organs (Leaf, Flower, Fruit, etc.)
Got the 1st place in Natural Background task, and in 4/5 subtasks.
(Coming in Sept., 2013.)
21
Conclusion A simple but effective method for FGVC
Embedding co-occurrence patterns of neighboring descriptors Obtain discriminative and small-dimensional latent descriptor
to use together with Fisher vector Polynomial embedding greatly improves the performance,
indicating the importance of non-linearity
Patch-level strong supervision approximation
Not always perfect but reasonable for FGVC problems
Future work Theoretical analysis (probabilistic interpretation) Multiple instance dimensionality reduction
22
Thank you!
Any questions?
23
Object and scene categorization
Caltech-101 (Object dataset)
MIT-Indoor-67 (Scene dataset)
24
Results: Object and scene categorization
Our method seems to be not as effective as in FGVC problems
Combining PCA feature + our feature consistent improves performance
Mean classification accuracy (%)
0
10
20
30
40
50
60
70
80
SIFT C-SIFT Opp.-SIFT0
10
20
30
40
50
60
SIFT C-SIFT Opp.-SIFT
Caltech-101 MIT-Indoor-67
0
10
20
30
40
50
60
70
80
SIFT C-SIFT Opp.-SIFT0
10
20
30
40
50
60
SIFT C-SIFT Opp.-SIFT
25