Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
1
Dual-Gradients Localization framework for Weakly Supervised Object Localization
Chuangchuang Tan 1*, Tao Ruan 1*, Guanghua Gu 2, Shikui Wei 1, Yao Zhao 1
1 Beijing Jiaotong University 2 Yanshan University
2
Target
Beijing Jiaotong University
o Weakly Supervised Object Localization (WSOL)
o WSOL is understanding an image at pixel level only using image-level annotations
o use much cheaper annotations
3
WSOL
Beijing Jiaotong University
o Steps of previous works :o Force classification network to focus on more regions of feature map.
o Produce localization map on the last convolutional layer by applying CAM.
o Problem:o ignore the localization ability of other layers.
o Both localization and classification tasks are
trained online I can produce WSOL, too
4
Dual-Gradients Localization(DGL) framework
Beijing Jiaotong University
o Characteristicso Simple, DGL is a offline approach,
needn’t to train for localization.
o Effective, achieving localization on any convolutional layer.
o Main ideas:o Utilize gradients of classification loss function to mine entire target object regions.
o Leverage gradients of target class to identify the correlation ratio of pixels to the target
class within any convolutional feature maps
Source image Mixed_6f Mixed_6e
5
𝑆𝑖
Overview of the DGL framework
Beijing Jiaotong University
Cross-entropy loss
GAP
softmax
FC… …
𝜕𝐽(p, 𝛼𝑦𝑐)
𝜕𝑆
𝜕𝑦𝐶𝜕𝑆
Class-aware Enhanced Map Branch
⊝
𝑙2 𝑛𝑜𝑟𝑚𝑜𝑙𝑖𝑧𝑒
𝑙2 𝑛𝑜𝑟𝑚𝑜𝑙𝑖𝑧𝑒
Enhanced map ⨂
Pixel-level Selection Branch
…
Classification model
Feature Maps
𝑠𝑢𝑚 𝑎𝑛𝑑 𝑟𝑒𝑠𝑖𝑧𝑒
𝑆𝑖 𝑆𝑛−1 𝑆𝑛
𝑆𝑛
Localization Maps
6
Classification model
Beijing Jiaotong University
Cross-entropy loss
GAP
softmax
FC… …Classification model
𝑆𝑖 𝑆𝑛−1 𝑆𝑛
o Classification model architecture:o use a customized InceptionV3, i.e. SPG-plain.
o remove the layers after the second Inception block, i.e., the third Inception block, pooling and linear layer.
o add two convolutional layers
o add a GAP layer and a softmax layer
7
𝑆𝑖
Class-aware Enhanced Map Branch
Beijing Jiaotong University
𝜕cost(p, 𝛼𝑦𝑐)
𝜕𝑆
Class-aware Enhanced Map Branch
⊝
𝑙2 𝑛𝑜𝑟𝑚𝑜𝑙𝑖𝑧𝑒
𝑙2 𝑛𝑜𝑟𝑚𝑜𝑙𝑖𝑧𝑒
Enhanced map A
Feature Maps
𝑆𝑛
o feature maps predicted to class c only capture the discrimination parts of objects, when the feature maps close the boundary of classification regions
o the feature maps located at center of classification regions can highlight more object regions
8
𝑆𝑖
Class-aware Enhanced Map Branch
Beijing Jiaotong University
𝜕cost(p, 𝛼𝑦𝑐)
𝜕𝑆
Class-aware Enhanced Map Branch
⊝
𝑙2 𝑛𝑜𝑟𝑚𝑜𝑙𝑖𝑧𝑒
𝑙2 𝑛𝑜𝑟𝑚𝑜𝑙𝑖𝑧𝑒
Enhanced map A
Feature Maps
𝑆𝑛
o our key idea of Class-aware Enhanced Map is pulling the feature maps toward inside of the classification region for specific-class, along with gradients of classification loss function.
9
Pixel-level Selection Branch
Beijing Jiaotong University
𝜕𝑦𝐶𝜕𝑆
⨂
Pixel-level Selection Branch
…𝑠𝑢𝑚 𝑎𝑛𝑑 𝑟𝑒𝑠𝑖𝑧𝑒
Enhanced map A
o Is gradients or weights?o CAM actually achieves localization by employing a weighted sum of
feature maps and gradients of target class on the last convolutional layer, instead of weights of the final FC layer.
o Pixel-level Selection is a generalization to CAM.
10
Results on the Validation Set of LID
Beijing Jiaotong University
MS: Multi-scale inputs during test MC: Morph close the localization map during test
MS MC mIoU
✘ ✘ 58.23
✔ ✘ 61.46
✔ ✔ 62.22
o Fusion the localization maps of branch1 and branch2 on Mixed_6e layer.
o Input size 324
11
o Examples of DGL on test set
Qualitative Results
Beijing Jiaotong University
12
Thanks
Beijing Jiaotong University