Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Deep Learning for Computer VisionFall 2021
http://vllab.ee.ntu.edu.tw/dlcv.html (Public website)
https://cool.ntu.edu.tw/courses/8854 (NTU COOL)
Yu‐Chiang Frank Wang 王鈺強, Professor
Dept. Electrical Engineering, National Taiwan University
2021/12/14
What to Cover Today…
• Meta‐Learning for Few‐Shot Learning• Few‐Shot Image Segmentation• Few‐Shot Object Detection
• Meta‐Learning for Domain Generalization• From Domain Adaptation to Domain Generalization
• Challenges in Few‐Shot Learning Tasks
• Self‐Supervised Learning (SSL)• Pretext Tasks• Contrastive Learning
2Selected slide credits: C. Finn, S. Levine, & H.‐Y. Lee
What If Only Limited Amount of Data Available?
• Naive transfer?• Model finetuning:e.g., Train a learning model (e.g., CNN) on large‐size data (base classes), following by finetuning on small‐size data (novel classes).‐ That is, freeze feature backbone (learned from base classes) and learn classifier weights for novel classes.
• Possibly poor generalization
3
objects of interest,driving scenarios, etc.
# of data big data
small data
Meta Learning = Learning to Learn• A powerful solution for learning from few‐shot data
• Let’s consider the following “2‐way 1‐shot” learning scheme:
4
Meta‐TrainingCat (+) Dog (‐)
Meta‐Testing
Train Test
Train Test
Train Test
Task i
Task i+1
Support set Query set
…
Cat (+) Dog (‐)
Apple (+) Orange (‐) Apple (+) Orange (‐)
Bike (+) Car (‐) Bike Car
Predict: + or ‐
Predict: + or ‐
Bike as + or ‐?
Slide credit: H.‐Y. Lee
Novel Task
Learn to Compare…with the Representative Ones!
• Prototypical Networks• Learn a model which properly describes data in terms of intra/inter‐class info.• It learns a prototype for each class, with data similarity/separation guarantees.
For DL version, the learned feature space is derived by a non‐linear mapping 𝑓and the representatives (i.e., prototypes) of each class is the mean feature vector 𝐜 .
5
𝑓𝑓
𝑓
support set 𝑆 𝑥 ,𝑦Meta‐Training Stage
Meta‐Testing Stage
, where 𝑆 ⊂ 𝑆 indicates features of class 𝑘 from support set 𝑆𝑓
A Closer Look at FSL (3/3)
• Performance with domain shifts (using ResNet‐18)• Existing FSL methods fail to address large domain shifts (e.g., mini‐ImageNet CUB)
and are inferior to the baseline methods.• This highlights the importance of learning to adapt to domain differences in FSL.
Chen et al., A Closer Look at Few‐shot Classification, ICLR, 20196
Semantic Segmentation
• Goal• Assign a class label to each pixel in the input image• Don’t differentiate instances, only care about pixels
7
Few‐Shot Segmentation
• A large number of image categories are with pixel‐wise ground truth labels,while a small number of them are with limited.
• A shared backbone produces feature maps for both support and query images.
• Prototypes for each class is obtained by masked pooling from support feature maps.
• Query feature maps are then compared with the pooled prototypes pixel‐by‐pixel.
• Typically, cosine similarity is adopted for pixel‐wise feature comparison.
8
OSLSM [BMVC 2017]
Shaban, Amirreza, et al. "One‐shot learning for semantic segmentation." BMVC 2017
• 𝑆 is an annotated image from a new semantic class
• Input 𝑆 to a function 𝑔 that outputs a set of parameters 𝜃
• 𝜃 is used to parameterize part of the segmentation model which produces a segmentation mask given 𝐼
9
OSLSM [BMVC 2017]
10Shaban, Amirreza, et al. "One‐shot learning for semantic segmentation." BMVC 2017
Dong, Nanqing, and Eric Xing. "Few‐Shot Semantic Segmentation with Prototype Learning." BMVC. 2018.
Prototype Learning [BMVC 2018]
• A prototype is learned for each foreground class and the background class.
• Prototypes are used to predict rough segmentation maps for each class.
• The final prediction is optimized using probability fusion.
11
PL [BMVC 2018]
12Dong, Nanqing, and Eric Xing. "Few‐Shot Semantic Segmentation with Prototype Learning." BMVC. 2018.
Jagersand et al., AMP: Adaptive masked proxies for few‐shot segmentation, ICCV 2019.
AMP [ICCV 2019]
• Adaptive masked proxies (i.e., prototypes’) are extracted for ach semantic class.
• Proxies update themselves in a continuous stream of data (e.g., video).
• Proxies from different resolution levels are used in multi‐resolution imprinting
13
AMP [ICCV 2019]
14Jagersand et al., AMP: Adaptive masked proxies for few‐shot segmentation, ICCV 2019.
CANet [CVPR 2019]
• Dense comparison module (DCM) concatenates prototypes to each spatial location in query feature map
• Rough segmented maps are produced after comparing with mask‐pooled feature prototypes
• The final result is optimized in an iterative manner
Chi, et al. CANet: Class‐agnostic segmentation networks with iterative refinement and attentive FSL, CVPR 2019. 15
CANet [CVPR 2019]
16Chi, et al. CANet: Class‐agnostic segmentation networks with iterative refinement and attentive FSL, CVPR 2019.
Todorovic et al., Feature weighting and boosting for few‐shot segmentation, ICCV 2019.
FWB [ICCV 2019]
• Standard FSL methods (e.g., shared backbone, masked pooling…) are used during training.
• A ‘relevance’ factor is added and taken into account during cosine similarity computation.
17
FWB [ICCV 2019]
• During inference, ensemble is utilized to select the best set of parameters
• Prototypes are used to predict the support masks reversely, which can be compared to the ground truth.
18Todorovic et al., Feature weighting and boosting for few‐shot segmentation, ICCV 2019.
PANet [ICCV 2019]
• Extracted prototypes are first used to predict query masks, as standard FSL methods do.
• Predicted query masks are used to generate new prototypes and reversely predict support masks
• Similar concept to that of the ‘cycle consistency’ (support→query; query→support)
Kaixin, et al., Panet: Few‐shot image semantic segmentation with prototype alignment, ICCV 2019. 19
PANet [ICCV 2019]
20Kaixin, et al., Panet: Few‐shot image semantic segmentation with prototype alignment, ICCV 2019.
Dataset & Evaluation Metric
• Datasets• PASCAL VOC 2012 (main)
• 20 classes• Split: (15 base + 5 novel)
• coco (secondary)
• Evaluation Metrics• Binary‐mIoU (difficult)• FB‐mIoU (easy)
• Foreground/Background IoU
21
Performance Comparisons
Method Split‐0 Split‐1 Split‐2 Split‐3 MeanReduced‐DFCN8s 39.2 48.0 39.3 34.2 40.2OSLSM BMVC 2017 33.6 55.3 40.9 33.5 40.8co‐FCN ICLRW 2018 36.7 50.6 44.9 32.4 41.2AMP ICCV 2019 41.9 50.2 46.7 34.7 43.4SG‐One 40.2 58.4 48.4 38.4 46.4PANet ICCV 2019 42.3 58.0 51.1 41.2 48.1PRNet 51.6 61.3 53.1 47.6 53.4Co‐att 49.5 65.5 50.0 49.2 53.5CANet CVPR 2019 52.5 65.9 51.3 51.9 55.4PGNet ICCV 2019 56.0 66.9 50.6 50.4 56.0FWB ICCV 2019 51.3 64.5 56.7 52.2 56.2
22
What to Cover Today…
• Meta‐Learning for Few‐Shot Learning• Few‐Shot Image Segmentation• Few‐Shot Object Detection
• Meta‐Learning for Domain Generalization• From Domain Adaptation to Domain Generalization
• Challenges in Few‐Shot Learning Tasks
• Self‐Supervised Learning (SSL)• Pretext Tasks• Contrastive Learning
23
Object Detection• Focus on object search: “Where is it?”
• Build templates that quickly differentiate object patch from background patch
Object or Non‐Object?
Dog Model
24
Two‐Stage vs. One‐Stage Object Detection
Methods
Sliding Windows
Two−stage Frameworks
R−CNNFast R−CNNMask R−CNN
⋮
One−stage Frameworks
YOLOYOLOv2YOLOv3
⋮
25
Few‐Shot Object Detection
• What if one cannot collect a sufficient amount of training data for the objects of interest? → Small Data Problem!
• Applications: defect detection, medical image analysis, etc.
26
Support #1 Support #2
Example of 1‐shot object detection
Query Image
Few‐Shot Object Detection with Attention‐RPN & Multi‐Relation Detector [CVPR’20]
• Possible solution: meta‐learning + object detection
• Network architecture (applicable for N‐way K‐shot setting)
• See the following 1‐way 1‐shot object detection for example:
27
Frustratingly Simple Few‐Shot Object Detection [ICML’20]
• Possible solution: object detection + fine tuning + meta‐learning?
• Network architecture
28
Frustratingly Simple Few‐Shot Object Detection [ICML’20]
• Possible solution: object detection + fine tuning or meta‐learning
• Network architecture
29Balanced N‐way K‐shot settings
What to Cover Today…
• Meta‐Learning for Few‐Shot Learning• Few‐Shot Image Segmentation• Few‐Shot Object Detection
• Meta‐Learning for Domain Generalization• From Domain Adaptation to Domain Generalization
• Challenges in Few‐Shot Learning Tasks
• Self‐Supervised Learning (SSL)• Pretext Tasks• Contrastive Learning
30
Recap: Domain Adaptation
• Domain‐Adversarial Training of Neural Networks (DANN)• Y. Ganin et al., ICML 2015• Maximize domain confusion = maximize domain classification loss• Minimize source‐domain data classification loss• The derived feature f can be viewed as a disentangled & domain‐invariant feature.
31
Domain Generalization
• Input: Images and labels from multiple source domains
• Output: A well‐generalized model for unseen target domains
32
DS = {Photo, Painting, Cartoon}DT = {Sketch}
Strategy of Episodic Training
• Episodic training for domain generalization (ICCV’19)
• Generalize across domains via Meta‐Learning
Zhang et al. : Episodic training for domain generalization. In ICCV (2019) 33
• Motivation
34
Domain Specific Models
Aggregated ModelEpisodic training
Episodic Training (cont’d)
35
Photo
Cartoon
• Random sample two domains, e.g., Photo and Cartoon
Episodic Training (cont’d)
36
Photo Cartoon
• Random sample two domains, e.g., Photo and Cartoon
Episodic Training (cont’d)
37
Episodic Training (cont’d)
Experiments
• Input: Images and labels from multiple source domains
• Output: A well‐generalized model for unseen target domains
38
DS = {Photo, Painting, Cartoon}DT = {Sketch}
• Domain Generalized Classification
Zhang et al.: Episodic training for domain generalization. In ICCV (2019) 39
Experiments (cont’d)
What to Cover Today…
• Meta‐Learning for Few‐Shot Learning• Few‐Shot Image Segmentation• Few‐Shot Object Detection
• Meta‐Learning for Domain Generalization• From Domain Adaptation to Domain Generalization
• Challenges in Few‐Shot Learning Tasks
• Self‐Supervised Learning (SSL)• Pretext Tasks• Contrastive Learning
40
Challenges & Opportunities in Small‐Data Problems
• Imbalanced Data Learning• Some categories with a sufficient # of data, while others are not → Small Data Problem!• E.g., medical image analysis, defect detection, etc.
• Possible Solutions• Reweighting instances, loss functions accordingly• Data augmentation/hallucination• However, augmenting/hallucinating data requires domain knowledge!
41
Challenges & Opportunities in Small‐Data Problems
• Learning with Partial Supervision• No pixel‐level ground truth but only image‐level labels available → Small Data Problem!
e.g., medical image detection or segmentation
• Possible Solutions• Active Learning (human‐in‐the‐loop)• Semi‐Supervised Learning (at least collect few images with pixel‐wise labels)• Weakly Supervised Learning (e.g., multiple instance learning)• Can be guided with auxiliary info (e.g., location or number of objects in an image)
42
Tumor Yes/No?
More Opportunities in Small‐Data Problems
• Self‐Supervised Learning (SSL)• A properly trained network backbone is the King!
(recall examples in transfer learning, domain adaptation or few‐shot learning)• Typically implemented in an unsupervised manner (e.g., via contrastive learning)• Let’s talk about SSL first…
43
What to Cover Today…
• Meta‐Learning for Few‐Shot Learning• Few‐Shot Image Segmentation• Few‐Shot Object Detection
• Meta‐Learning for Domain Generalization• From Domain Adaptation to Domain Generalization
• Challenges in Few‐Shot Learning Tasks
• Self‐Supervised Learning (SSL)• Pretext Tasks• Contrastive Learning
44
Supervised Learning
45
• Deep learning plus supervised learning are rocking the world ...
46
• In real world scenarios, data‐annotation is quite time‐consuming
• Could one exploit supervised signals from unlabeled data?
Self‐Supervised Learning (SSL)
47
• Learning discriminative representations from unlabeled data
• Create self‐supervised tasks via data augmentation
Rotation90。 Jigsaw Puzzle
Colorization
Self‐Supervised Learning (SSL)
48
• Self‐Supervised Pretraining
• Supervised Fine‐tuning
Self‐Supervised Learning (SSL)
49
• Pretext Tasks• Jigsaw (ECCV’16)• RotNet (ICLR’18)
• Contrastive Learning• CPC (ICML’20)• SimCLR (ICML’20)
• Learning w/o negative samples• BYOL (NeurIPS’20)• Barlow Twins (ICML’21)
Jigsaw Puzzle
50
• Assign the permutation index and perform augmentation
• Solve jigsaw puzzle by predicting the permutation index
Noroozi et al. “Unsupervised learning of visual representations by solving jigsaw puzzles.” ECCV 2016
RotNet
51
• Learning to predict the rotation angle
Gidaris et al. “Unsupervised Representation Learning by Predicting Image Rotations.” ICLR 2018
RotNet
52
• Filters learned with SSL exhibit more variety
Self‐Supervised Learning (SSL)
53
• Pretext Tasks• Jigsaw (ECCV’16)• RotNet (ICLR’18)
• Contrastive Learning• CPC (ICML’20)• SimCLR (ICML’20)
• Learning w/o negative samples• BYOL (NeurIPS’20)• Barlow Twins (ICML’21)
Contrastive Predictive Coding (CPC)
54Henaff et al. “Data‐efficient image recognition with contrastive predictive coding.“ ICML 2020
• Sample positive patches from itself and negative patches from other images
• Maximize positive similarities and minimize negative ones
positive
positive negative
SimCLR
55Chen et al. "A simple framework for contrastive learning of visual representations." ICML 2020
• Attract augmented images and repel negative samples
• Improve the quality with projection heads (g)…why?
SimCLR
56
• Experiments on semi‐supervised settings
Self‐Supervised Learning (SSL)
57
• Pretext Tasks• Jigsaw (ECCV’16)• RotNet (ICLR’18)
• Contrastive Learning• CPC (ICML’20)• SimCLR (ICML’20)
• Learning w/o negative samples• BYOL (NeurIPS’20)• Barlow Twins (ICML’21)
Bootstrap Your Own Latent (BYOL)
58
• No need of negative pairs
• Introduce the predictor for asymmetry to avoid collapse
• Exponential Moving Average (EMA)
Grill et al. “Bootstrap your own latent: A new approach to self‐supervised learning.” NeurIPS 2020
BYOL
59
• No need of negative pairs
• Introduce the predictor for asymmetry to avoid collapse
• Exponential Moving Average (EMA)
Grill et al. “Bootstrap your own latent: A new approach to self‐supervised learning.” NeurIPS 2020
Barlow Twins
60
• Enforce diversity among feature dimensions
• Maximize diagonal terms and minimize off‐diagonal ones
• No need of negative pairs, predictor network, gradient stopping or moving average techniques
Zbontar et al. “Barlow twins: Self‐supervised learning via redundancy reduction.” ICML 2021
Barlow Twins
61
• Experiments on classification
Barlow Twins
62
• Experiments on detection and segmentation
What We’ve Covered Today…
• Meta‐Learning for Few‐Shot Learning• Few‐Shot Image Segmentation• Few‐Shot Object Detection
• Meta‐Learning for Domain Generalization• From Domain Adaptation to Domain Generalization
• Challenges in Few‐Shot Learning Tasks
• Self‐Supervised Learning (SSL)• Pretext Tasks• Contrastive Learning• Contrastive Learning w/o Negative Pairs
63