Upload
gilbert-perry
View
213
Download
1
Tags:
Embed Size (px)
Citation preview
Semantic Kernel Forests from Multiple TaxonomiesSung Ju Hwang (University of Texas at Austin),
Fei Sha (University of Southern California),and Kristen Grauman (University of Texas at Austin)
Limitation of status quo recognition
Until recently, most categorization methods solely relied on the category labels, treating each instance as an isolated entity.
Semantic spaceVisual world
Cat
Dog
Wolf
Zebra
1
2
3
4
xx
x
x
Limitation of status quo recognition
Semantic space
Cat
Wolf
Zebra
Canine
Visual world
Dog
Pet
Wild
Similar
Dissimilar
However, semantic entities exist in relation to others.
Larger and finer-grained datasets → more meaningful relations
How can we exploit such relations for improved categorization?[Fergus10] Semantic Label Sharing for Learning with Many Categories, R. Fergus, H. Bernal, Y. Weiss, A. Torralba,, ECCV 2010[Zhao11] Large Scale Category Structure Aware Image Classification, B. Zhao, L. Fei Fei, E. P. Xing, NIPS 2011
MotivationOur focus: a semantic taxonomy
1) Partial alignment between the taxonomy and visual distribution
Dalmatian WolfSiam. Cat
Domestic
leopard
Wild
Tameness
Dalmatian WolfLeopard
Spotted
Siam.Cat
Pointy Corner
Texture
Dalmatian Siam.CatWolf
Canine
leopard
Feline
Animal
BiologicalAppearance Habitat
2) No single ‘optimal’ taxonomy
- But, potentially two snags.
What information to exploit from multiple taxonomies and how to leverage it?
Idea
Dalmatian WolfSiam. Cat
Domestic
leopard
Wild
Tameness
Exploit multiple semantic taxonomies for visual feature learning
Dalmatian WolfLeopard
Spotted
Siam.Cat
Pointy Corner
Texture
Dalmatian Siam.CatWolf
Canine
leopard
Feline
Animal
Dog-like shape
Cat-face
Biological Appearance HabitatSpot Pointy corner Indoor setting,
personWoods
- Taxonomies provide human merge/split criteria- Each taxonomy provides complementary information
How do we then,
1. Learn granularity and view specific features on each taxonomy, and
2. Combine learned features across taxonomies for object recognition?
OverviewGoal: Learn and combine features across multiple taxonomies
Dalmatian WolfSiam. Cat
Domestic
leopard
Wild
Tameness
Dalmatian WolfLeopard
Spotted
Siam.Cat
Pointy Corner
Texture
Dalmatian Siam.CatWolf
Canine
leopard
Feline
Animal
Dog-like shape
Cat-face Spot Pointy corner Indoor setting,person
Woods
Dog-like shape
Cat-face Spot Pointy cornerIndoor setting,person
Woods
1. Learn view and granularity specific features at each taxonomy
Categorization model
2. Optimally combine learned features for categorization+ + + + +
[Hwang11] Learning a Tree of Metrics with Disjoint Visual Features, S. J. Hwang, F. Sha, and K. Grauman, NIPS 2011
Tree of Metrics
How to learn granularity- and view- specific features?
Siamese cat
Persian cat
Canine
Carnivore
Feline
Domestic Cat
Dalmatian Wolf Bit cat
Intuition: Features useful for the discrimination of the superclasses less useful for subcategory discrimination
– Exploit parent-child relationship to isolate features at each node
[Hwang11] Learning a Tree of Metrics with Disjoint Visual Features, S. J. Hwang, F. Sha, and K. Grauman, NIPS 2011
Given a taxonomy ,we learn a metric for each internal (superclass) node n to discriminate between its subclasses.
Tree of Metrics
Approach the feature learning problem as hierarchical metric learning with disjoint regularization
xixj
xl
Feline
Canine
margin
[Hwang11] Learning a Tree of Metrics with Disjoint Visual Features, S. J. Hwang, F. Sha, and K. Grauman, NIPS 2011
Lighter element has higher value
Siamese cat
Persian cat
Carnivore
Domestic Cat
Wolf
Canine Feline
Dalmatian Big cat
Tree of Metrics
Siamese cat
Persian cat
Carnivore
Domestic Cat
Given a taxonomy ,we learn a metric for each internal (superclass) node n to discriminate between its subclasses.
Canine Feline
WolfDalmatian Big cat
Approach the feature learning problem as hierarchical metric learning with disjoint regularization
[Hwang11] Learning a Tree of Metrics with Disjoint Visual Features, S. J. Hwang, F. Sha, and K. Grauman, NIPS 2011
Further, we learn all metrics simultaneously with two regularizationsA sparsity-based regularization to identify informative features.A disjoint regulazation to learn features exclusive to each granularity.
Tree of Metrics
Siamese cat
Persian cat
Carnivore
Feline
Domestic Cat
Canine
WolfDalmatian Big cat
[Hwang11] Learning a Tree of Metrics with Disjoint Visual Features, S. J. Hwang, F. Sha, and K. Grauman, NIPS 2011
Regularization Terms to Learn Compact, Discriminative Metrics
Minimize the sum of the diagonal entries.
Sparsity regularizationHow can we select few informative features at each node?
→ Competition between features in a single metric
[Hwang11] Learning a Tree of Metrics with Disjoint Visual Features, S. J. Hwang, F. Sha, and K. Grauman, NIPS 2011
Regularization Terms to Learn Compact, Discriminative Metrics
How can we regularize each metric to use features disjoint from its ancestors?
Disjoint regularization
Enforce two metrics not to have large value at the same time, for the same feature.
Ancestor Descendant
→ Competition between ancestors and descendants
Both regularizers are convex
[Hwang11] Learning a Tree of Metrics with Disjoint Visual Features, S. J. Hwang, F. Sha, and K. Grauman, NIPS 2011
OverviewGoal: Learn and combine features across multiple taxonomies
Dalmatian WolfSiam. Cat
Domestic
leopard
Wild
Tameness
Dalmatian WolfLeopard
Spotted
Siam.Cat
Pointy Corner
Texture
Dalmatian Siam.CatWolf
Canine
leopard
Feline
Animal
Dog-like shape
Cat-face Spot Pointy corner Indoor setting,person
Woods
Dog-like shape
Cat-face Spot Pointy cornerIndoor setting,person
Woods
1. Learn view and granularity specific features at each taxonomy
2. Optimally combine learned features for categorization+ + + + +
Categorization model[Hwang12] S. J. Hwang, F. Sha, K. Grauman, Semantic Kernel Forest from Multiple Taxonomies, NIPS 2012
Semantic Kernel Forest
Dalmatian WolfSiam. Cat
Domestic
leopard
Wild
Tameness
Dalmatian WolfLeopard
Spotted
leopard
Pointy Corner
Texture
Dalmatian Siam.CatWolf
Canine
leopard
Feline
Animal
Biological Appearance Habitat
From multiple ToMs, we obtain a semantic kernel forest, a set of non-linear view- and granularity- specific feature spaces
Compute RBF kernel on the distance computed using the learned metric
[Hwang12] S. J. Hwang, F. Sha, K. Grauman, Semantic Kernel Forest from Multiple Taxonomies, NIPS 2012
Semantic Kernel Forest
Dalmatian WolfSiam. Cat
Domestic
leopard
Wild
Tameness
Dalmatian WolfLeopard
Spotted
leopard
Pointy Corner
Texture
Dalmatian Siam.CatWolf
Canine
leopard
Feline
Animal
Biological Appearance Habitat
How to combine the learned kernel forest for optimal discrimination?
Obtain class specific kernel by linearly combining kernels on the tree paths.
multiple kernel learning
Consider only a small fraction of relevant kernels – O(TlogN)
Proposed Sparse Hierarchical Regularization
Dalmatian Siam.CatWolf leopard
Feline
Animal Biological
Dalmatian WolfSiam.Cat
Domestic
leopard
Wild
Habitat
Usual L1 regularization: selects few useful kernels
Multiple taxonomies may provide some redundant kernels
Canine
Tameness
- Interleaved selection of kernels
Are all kernels equal?
Hierarchical regularization- weight of a node must be larger than its children’s
Dalmatian Siam.CatWolf
Canine
leopard
Feline
Animal Biological
<
Proposed Sparse Hierarchical Regularization
- Implicitly enforce hierarchical structure among kernels
Dalmatian WolfSiam.Cat
Domestic
leopard
Wild
Habitat
<
Tameness
Multiple taxonomies provide redundant kernels- Higher level kernels discriminate with more categories
Optimization for Semantic Kernel Forest
Nonsmooth due to the hierarchical regularization term
MKL objective
- Use projected subgradient to optimize
We minimize the sum of the MKL objective + regularization term
Sparsity Reg. Hierarchical regularization
[Hwang12] S. J. Hwang, F. Sha, K. Grauman, Semantic Kernel Forest from Multiple Taxonomies, NIPS 2012
Datasets
AWA-10-6,180 images-10 animal classes-Fine-grained
Imagenet-20 -28,957 images-20 non-animal classes-Coarser-grained
(a) Wordnet (b) Appearance (c) Behavior (d) Habitat
(a) Wordnet (b) Visual (c) Attributes
Constructed on different attribute groups
Multiclass Classification Results
Method AWA-4 AWA-10 Imagenet-20 Raw feature kernel 47.67±2.22 30.80±1.36 28.20±1.45
Raw feature kernel + MKL 48.50±1.89 31.13±2.81 27.67±1.50 Perturbed semantic kernel tree N/A 31.53±2.07 28.20±2.02
We compare to three baselines
- Raw feature kernel: RBF kernel computed on the original image features- Raw feature kernel + MKL: MKL with RBF kernels with different bandwidth.- Perturbed semantic kernel tree: Semantic kernel forest on randomly permuted taxonomy.
Multiclass Classification Results
Method AWA-4 AWA-10 Imagenet-20 Raw feature kernel 47.67±2.22 30.80±1.36 28.20±1.45
Raw feature kernel + MKL 48.50±1.89 31.13±2.81 27.67±1.50 Perturbed semantic kernel tree N/A 31.53±2.07 28.20±2.02
Semantic kernel tree + Avg 47.17±2.40 31.92±1.21 28.97±1.61 Semantic kernel tree + MKL 48.89±1.06 32.43±1.93 29.74±1.26
Semantic kernel tree + MKL-H 50.06±1.12 32.68±1.79 29.90±0.70
Semantic kernel tree (ToM) > perturbed kernel tree
- Semantic kernel tree + Avg: Averged semantic kernel tree on a single taxonomy- Semantic kernel tree + MKL: MKL on a single taxonomy only with sparsity reg.- Semantic kernel tree + MKL-H: with both sparsity and hierarchical regularization.
- More meaningful grouping/splits for object categorization
Multiclass Classification Results
Method AWA-4 AWA-10 Imagenet-20 Raw feature kernel 47.67±2.22 30.80±1.36 28.20±1.45
Raw feature kernel + MKL 48.50±1.89 31.13±2.81 27.67±1.50 Perturbed Semantic kernel tree N/A 31.53±2.07 28.20±2.02
Semantic kernel tree + Avg 47.17±2.40 31.92±1.21 28.97±1.61 Semantic kernel tree + MKL 48.89±1.06 32.43±1.93 29.74±1.26
Semantic kernel tree + MKL-H 50.06±1.12 32.68±1.79 29.90±0.70 Semantic kernel forest+MKL 49.67±1.11 34.60±1.78 30.97±1.14
Semantic kernel forest+MKL-H 52.83±1.68 35.87±1.22 32.30±1.00
- Semantic kernel forest+MKL: MKL with kernels learned on multiple taxonomies, with only the sparsity regularization
- Semantic kernel forest+MKL-H: with both sparsity and hierarchical regularization.
Multiple taxonomies > a single taxonomy
- Each taxonomy provides complementary information
Multiclass Classification Results
Method AWA-4 AWA-10 Imagenet-20 Raw feature kernel 47.67±2.22 30.80±1.36 28.20±1.45
Raw feature kernel + MKL 48.50±1.89 31.13±2.81 27.67±1.50 Perturbed Semantic kernel tree N/A 31.53±2.07 28.20±2.02
Semantic kernel tree + Avg 47.17±2.40 31.92±1.21 28.97±1.61 Semantic kernel tree + MKL 48.89±1.06 32.43±1.93 29.74±1.26
Semantic kernel tree + MKL-H 50.06±1.12 32.68±1.79 29.90±0.70 Semantic kernel forest+MKL 49.67±1.11 34.60±1.78 30.97±1.14
Semantic kernel forest+MKL-H 52.83±1.68 35.87±1.22 32.30±1.00
Hierarchical regularizer > Standard L1 regularization
- Regularizer’s effect is minimal on the semantic kernel tree, which lacks redundancy
- Good to consider the structure of the feature spaces
Confusion matrices on 4 animal classes
Blue: Low confusion
Red: High confusion
Each taxonomy is suboptimal, but provides complementary information which could be optimally leveraged with MKL
Canine Feline
Animal
Biological
Dalmatian Wolf Siam. Cat Leopard
Spotted Pointy Ear
Animal
Appearance
Dalmatian WolfSiam. CatLeopard
Domestic Wild
Animal
DalmatianSiam. Cat Leopard Wolf
Habitat
Lower
Wordnet
Higher
Appearance Behavior Habitat
Effect of hierarchical regularization
Sparsity regularization only: 34.33Sparsity+ Hierarchical: 35.67
Hierarchical regularizer avoids overfitting with implicit structure enforced among kernels
procyonid
felineeven-toed
aquatic
carnivore
placental
cat/rathairless
~panda
appearance
racoon/rat
landaquatic
predator/prey
behavior
junglenonjungle
aquatic
landhabitat
Key message: semantic taxonomies for visualfeature learning
Summary
- Exploits disjoint sparsity between parent and child classes in a taxonomy: Tree of Metrics- Leverages complementary information from multiple
semantic taxonomies: Semantic Kernel Forest- Novel regularizers that exploit category relations
- Disjoint regularizer that exploits parent-child relationship to learn disjoint features.
- Hierarchical regularizer that favors upper level kernels.
[Hwang12] S. J. Hwang, F. Sha, K. Grauman, Semantic Kernel Forest from Multiple Taxonomies, NIPS 2012[Hwang11] Learning a Tree of Metrics with Disjoint Visual Features, S. J. Hwang, F. Sha, and K. Grauman, NIPS 2011
IntuitionCompeting features between parent and childComplementary across different semantic views
Learning MethodsDisjoint and hierarchical regularizer for competing featuresMKL with hierarchical regularizer for complementary features
Key message: Semantic taxonomies for visual feature learning
Summary
[Hwang12] S. J. Hwang, F. Sha, K. Grauman, Semantic Kernel Forest from Multiple Taxonomies, NIPS 2012[Hwang11] Learning a Tree of Metrics with Disjoint Visual Features, S. J. Hwang, F. Sha, and K. Grauman, NIPS 2011
Intuition:Competing features between parent and childComplementary across different semantic views
Learning methods:Disjoint regularizer for competing featuresMKL with hierarchical regularizer for complementary features
A single taxonomy often improves performance on someclasses, at the expense of others. - Individual taxonomy suboptimal.
Habitat- Better for h. whale- Worse for panda Wordnet- Better for panda- Worse for h. whaleAll- Better for both
Semantic kernel forest takes the best of both through learned combination.
Per-class results
IdeaLearn non-linear feature space for each view and granularity,that splits the categories according to each merge/split criteria
Dog-like shape
Cat-face
Canine vs. Feline
Dalmatian WolfSiam. Cat
Domestic
leopard
Wild
Tameness
Dalmatian WolfLeopard
Spotted
Siam.Cat
Pointy Corner
Texture
Appearance HabitatSpot Pointy corner Indoor setting,
personWoods
Idea
Dog-like shape
Cat-face
Canine vs. Feline
Dalmatian WolfSiam. Cat
Domestic
leopard
Wild
Tameness
HabitatIndoor setting,person
Woods
Spot vs. Pointy cornerSpot Pointy corner
Learn non-linear feature space for each view and granularity,that splits the categories according to each merge/split criteria
Idea
Dog-like shape
Cat-face
Canine vs. Feline Spot vs. Pointy cornerSpot Pointy corner
Learn non-linear feature space for each view and granularity,that splits the categories according to each merge/split criteria
Domestic vs. WildIndoor setting,person
Woods
Idea
Dog-like shape
Cat-face
Spot vs. Pointy corner Domestic vs. WildSpot Pointy corner Indoor setting,
personWoods
Canine vs. Feline
Then, combine the feature space to obtain an optimally discriminative space for categorization.
Combinedfeature space
How do we then,- learn such features, and- optimally combine them?