View
2
Download
0
Category
Preview:
Citation preview
Towards Weakly Supervised
Object Segmentation & Scene Parsing
Yunchao Wei
IFP, Beckman Institute, University of Illinois at Urbana-Champaign, IL, USA
Self-Erasing Network for Integral Object AttentionQibin Hou1, Peng-Tao Jiang1, Yunchao Wei 2, Ming-Ming Cheng1
1College of Computer Science, Nankai University, Beijing, China
2IFP, Beckman Institute, University of Illinois at Urbana-Champaign, IL, USA
Image Object Localization Map
…
Fully Convolutional Networks
Background
Weak Supervision: Lower degree (or cheaper, simper) annotations at training stage than the required outputs at the testing stage.
scribblespointsimage-level labels
horse
person
bounding boxes
horse person
Weak Supervision
Background
FCN
LOSS
bkg
horse
person
Learn to Produce Pseudo Mask
The Popular Pipeline
dog bird
Dense and integral object localization maps
Our target
Background
Small and sparse object localization maps
Class Activation Mapping [Zhou CVPR16]
Our Target & Current Issue
Revisit Adversarial Erasing
Object Region Mining with Adversarial Erasing [Wei CVPR17]
Revisit Adversarial Erasing
Adversarial Complementary Learning [Zhang CVPR18]
Revisit Adversarial Erasing
Over Erasing: The Failure Case of Adversarial Erasing
The backgroundregions can notbe suppressed!!
Our Solution: Self-Erasing Network
Motivation
Image Attention Map Ternary Mask
object priors
background priors
Our Solution: Self-Erasing Network
Ternary Mask
object priors
background priors
𝛿ℎ𝛿𝑙
Attention Map
Backbone
dining table
Backbone
Conv Attention Ternary Mask T
Lo
ss
Conv
GAP
SA
SB
GAP
Lo
ss...... L
oss
Conv
GAP
SC
...C-ReLU strategy I
T
T ThresholdingGAP Global Ave Pool
dining table
C-ReLU strategy II
Our Solution: Self-Erasing Network
Framework
Conv
SA
GAP
Loss...
Attention Ternary Mask T
T
Loss
Conv
GAP ... Loss
Conv
GAP
SC
...
Experimental Results
Ours ACoL [Zhang CVPR18] ACoL [Zhang CVPR18]Ours
Experimental Results
Pascal VOC 2012
Experimental Results
015
Weakly Supervised Scene Parsing with Point-based
Distance Metric LearningRui Qian1,3, Yunchao Wei 3, Honghui Shi 2,3, Jiachen Li 3, Jiaying Liu1 and Thomas Huang3
1Institute of Computer Science and Technology, Peking University, Beijing, China
2IBM T.J. Waston Research Center, 3IFP, Beckman, UIUC
Point Annotation Full Annotation
016 Review on Weakly Supervised Methods
◼ Weakly supervised methods for scene parsing◼ Image-level
◼ Box supervision
◼ Scribble supervision
◼ Point supervision
PersonBikeTreeSkyRoad
03 Annotation Comparison
◼ Annotation burden comparison
Method Full Scribble Point
Average
Anno.pixel/Image170K 1817.48 12.26
018 Motivation
Input
PointSup
Limited
label!
Input
PointSup
Limited
label!
How to utilize limited annotation?
Cross-image semantic similarity!
train
track
tree
05 Proposed method(1/5)
◼ Overview◼ Point-based distance metric learning(PDML)
◼ Point supervision(PointSup)
◼ Online extension supervision(ExtendSup)
Feature
Extract
PDML
Classification
Module
Input Feature Prediction
PointSup
ExtendSup
CrossEntropy
Loss
06 Proposed method(2/5)
◼ Point supervision◼ Only calculate cross-entropy loss on annotated pixels
◼ Back propagate gradients accordingly
◼ Optimize by stochastic gradient descent
Feature
ExtractClassification
Module
Input Feature PredictionPointSup
Cro
ssE
ntro
py
07 Proposed method(3/5)
◼ Online extension supervision◼ Extension method1 (region):
◼ Select pixels in 5*5 square near the annotated ones
◼ Extension method2 (score):
◼ Select pixels with score over 0.7 in the prediction
◼ Finally choose the intersection of two methods
Feature
ExtractClassification
Module
Input Feature Prediction
PointSup
Cro
ssE
ntro
py
Online
Extension
08 Proposed method(4/5)
◼ Point-based distance metric learning
Shared
weights
(b)
Feature
Extract
𝐼𝑎
𝐼𝑏
Feature
Extract
Feature
Feature
platformtrain
minimizing
maximizing
(a)
Subgroup
Point-based Distance
Metric Learning
𝐼𝑎
𝐼𝑏𝐼𝑐
𝐼𝑑
train platform
09 Proposed method(5/5)
◼ Loss function of PDML◼ For each image 𝐼𝑎 , define the embedding vector set as 𝐸𝑎:
◼ 𝐸𝑎 = 𝑖=1ڂ|𝑀𝑎|{𝑃𝑎𝑖}
◼ |𝑀𝑎| is the number of annotated pixel of 𝐼𝑎◼ 𝑃𝑎𝑖 is the feature vector of 𝑖th pixel
◼ We optimize in the triplet form of {𝑃𝑎𝑖 , 𝑃𝑏𝑗, 𝑃𝑏𝑘}:
◼ 𝑃𝑎𝑖 shares the same category with 𝑃𝑏𝑗◼ 𝑃𝑏𝑘 shares different category with 𝑃𝑎𝑖 , 𝑃𝑏𝑗
◼ We use the loss function of :
𝐿𝑡 𝑃𝑎𝑖 , 𝑃𝑏𝑗, 𝑃𝑏𝑘 = α𝐿𝑝 𝑃𝑎𝑖, 𝑃𝑏𝑗 + 𝛽𝐿𝑛 𝑃𝑎𝑖 , 𝑃𝑏𝑗 , 𝑃𝑏𝑘
◼ 𝐿𝑝 𝑃𝑎𝑖 , 𝑃𝑏𝑗 = ||𝑃𝑎𝑖 − 𝑃𝑏𝑗||2
◼ 𝐿𝑛 𝑃𝑎𝑖, 𝑃𝑏𝑗 , 𝑃𝑏𝑘 = max(||𝑃𝑎𝑖 − 𝑃𝑏𝑗||2 - ||𝑃𝑎𝑖 − 𝑃𝑏𝑘||2 + 𝑚, 0)
◼ α, 𝛽,𝑚 are hyper-params and are set to 0.8, 1, 20 in practice
10 Datasets
◼ Scene parsing datasets◼ PASCAL-Context
◼ ADE 20K
Dataset #Training #Evaluation #Instance/Image
PASCAL-
Context4998 5105 12.26
ADE20K 20210 2000 13.96
25 Experimental Results
Method Metrics
FullSup PointSup PDML Online Ext. mIoU Pixel Acc
√ 39.6 78.6%
√ 27.9 55.3%
√ √ 29.7 57.5%
√ √ √ 30.0 57.6%
◼ Quantitative evaluation on PASCAL-Context◼ The combination of three techniques is best
◼ We use only 0.007% annotated data but reached
75% of the full supervision performance!
26 Experimental Results
Method Metrics
FullSup PointSup PDML Online Ext. mIoU Pixel Acc
√ 33.9 75.8%
√(SegNet) 21.0 /
√ 17.7 58.0%
√ √ 19.0 59.0%
√ √ √ 19.6 61.0%
◼ Quantitative evaluation on ADE20K◼ The combination of three techniques is best
◼ Our method approaches the result SegNet under
full supervision scheme
27
Image
PointSup
PDML
Final
GT
Subjective Evaluation
28
Image
PointSup
PDML
Final
GT
Subjective Evaluation
29
Image
PointSup
PDML
Final
GT
Subjective Evaluation
30
Image
PointSup
PDML
Final
GT
Subjective Evaluation
31
Image
PointSup
PDML
Final
GT
Subjective Evaluation
32
Image
PointSup
PDML
Final
GT
Subjective Evaluation
33
Image
PointSup
PDML
Final
GT
Subjective Evaluation
34
Image
PointSup
PDML
Final
GT
Subjective Evaluation
35
Image
PointSup
PDML
Final
GT
Subjective Evaluation
36
Image
PointSup
PDML
Final
GT
Subjective Evaluation
23 Discussion
◼ Visualization the effect of PDML ◼ dis(+): L2 norm distance between same-class feature vectors
◼ dis(-): L2 norm distance between different-class feature vectors
23 Discussion
◼ Ablation on the design of PDML loss function◼ 𝐿𝑡 𝑃𝑎𝑖, 𝑃𝑏𝑗 , 𝑃𝑏𝑘 = α𝐿𝑝 𝑃𝑎𝑖 , 𝑃𝑏𝑗 + 𝛽𝐿𝑛 𝑃𝑎𝑖 , 𝑃𝑏𝑗 , 𝑃𝑏𝑘
039
◼ Problem
◼ Point-guided scene parsing
◼ Point-based distance metric learning
◼ Exploit semantic relationship across images
◼ Experimental results
◼ Good performance both quantitatively and
qualitatively
Conclusion
Conclusion
Weakly Supervised Learning for Real-World Computer Vision Applications & The 1st Learning from Imperfect Data (LID) Challenge
CVPR 2019 Workshop, Long Beach, CA
Object Segmentation on ILSVRC DET (Image-level Supervision) Scene Parsing on ADE20K (Point Supervision)
Task 1 Task 2
https://lidchallenge.github.io/
42 Thank you
Recommended