Deep Learning for Computer

Deep Learning for Computer VisionFall 2021

http://vllab.ee.ntu.edu.tw/dlcv.html (Public website)

https://cool.ntu.edu.tw/courses/8854 (NTU COOL)

Yu‐Chiang Frank Wang 王鈺強, Professor

Dept. Electrical Engineering, National Taiwan University

2021/12/14

What to Cover Today…

• Meta‐Learning for Few‐Shot Learning• Few‐Shot Image Segmentation• Few‐Shot Object Detection

• Meta‐Learning for Domain Generalization• From Domain Adaptation to Domain Generalization

• Challenges in Few‐Shot Learning Tasks

• Self‐Supervised Learning (SSL)• Pretext Tasks• Contrastive Learning

2Selected slide credits: C. Finn, S. Levine, & H.‐Y. Lee

What If Only Limited Amount of Data Available?

• Naive transfer?• Model finetuning:e.g., Train a learning model (e.g., CNN) on large‐size data (base classes), following by finetuning on small‐size data (novel classes).‐ That is, freeze feature backbone (learned from base classes) and learn classifier weights for novel classes.

• Possibly poor generalization

3

objects of interest,driving scenarios, etc.

# of data big data

small data

Meta Learning = Learning to Learn• A powerful solution for learning from few‐shot data

• Let’s consider the following “2‐way 1‐shot” learning scheme:

4

Meta‐TrainingCat (+) Dog (‐)

Meta‐Testing

Train Test

Train Test

Train Test

Task i

Task i+1

Support set Query set

…

Cat (+) Dog (‐)

Apple (+) Orange (‐) Apple (+) Orange (‐)

Bike (+) Car (‐) Bike Car

Predict: + or ‐

Predict: + or ‐

Bike as + or ‐?

Slide credit: H.‐Y. Lee

Novel Task

Learn to Compare…with the Representative Ones!

• Prototypical Networks• Learn a model which properly describes data in terms of intra/inter‐class info.• It learns a prototype for each class, with data similarity/separation guarantees.

For DL version, the learned feature space is derived by a non‐linear mapping 𝑓and the representatives (i.e., prototypes) of each class is the mean feature vector 𝐜 .

5

𝑓𝑓

𝑓

support set 𝑆 𝑥 ,𝑦Meta‐Training Stage

Meta‐Testing Stage

, where 𝑆 ⊂ 𝑆 indicates features of class 𝑘 from support set 𝑆𝑓

A Closer Look at FSL (3/3)

• Performance with domain shifts (using ResNet‐18)• Existing FSL methods fail to address large domain shifts (e.g., mini‐ImageNet CUB)

and are inferior to the baseline methods.• This highlights the importance of learning to adapt to domain differences in FSL.

Chen et al., A Closer Look at Few‐shot Classification, ICLR, 20196

Semantic Segmentation

• Goal• Assign a class label to each pixel in the input image• Don’t differentiate instances, only care about pixels

7

Few‐Shot Segmentation

• A large number of image categories are with pixel‐wise ground truth labels,while a small number of them are with limited.

• A shared backbone produces feature maps for both support and query images.

• Prototypes for each class is obtained by masked pooling from support feature maps.

• Query feature maps are then compared with the pooled prototypes pixel‐by‐pixel.

• Typically, cosine similarity is adopted for pixel‐wise feature comparison.

8

OSLSM [BMVC 2017]

Shaban, Amirreza, et al. "One‐shot learning for semantic segmentation." BMVC 2017

• 𝑆 is an annotated image from a new semantic class

• Input 𝑆 to a function 𝑔 that outputs a set of parameters 𝜃

• 𝜃 is used to parameterize part of the segmentation model which produces a segmentation mask given 𝐼

9

OSLSM [BMVC 2017]

10Shaban, Amirreza, et al. "One‐shot learning for semantic segmentation." BMVC 2017

Dong, Nanqing, and Eric Xing. "Few‐Shot Semantic Segmentation with Prototype Learning." BMVC. 2018.

Prototype Learning [BMVC 2018]

• A prototype is learned for each foreground class and the background class.

• Prototypes are used to predict rough segmentation maps for each class.

• The final prediction is optimized using probability fusion.

11

PL [BMVC 2018]

12Dong, Nanqing, and Eric Xing. "Few‐Shot Semantic Segmentation with Prototype Learning." BMVC. 2018.

Jagersand et al., AMP: Adaptive masked proxies for few‐shot segmentation, ICCV 2019.

AMP [ICCV 2019]

• Adaptive masked proxies (i.e., prototypes’) are extracted for ach semantic class.

• Proxies update themselves in a continuous stream of data (e.g., video).

• Proxies from different resolution levels are used in multi‐resolution imprinting

13

AMP [ICCV 2019]

14Jagersand et al., AMP: Adaptive masked proxies for few‐shot segmentation, ICCV 2019.

CANet [CVPR 2019]

• Dense comparison module (DCM) concatenates prototypes to each spatial location in query feature map

• Rough segmented maps are produced after comparing with mask‐pooled feature prototypes

• The final result is optimized in an iterative manner

Chi, et al. CANet: Class‐agnostic segmentation networks with iterative refinement and attentive FSL, CVPR 2019. 15

CANet [CVPR 2019]

16Chi, et al. CANet: Class‐agnostic segmentation networks with iterative refinement and attentive FSL, CVPR 2019.

Todorovic et al., Feature weighting and boosting for few‐shot segmentation, ICCV 2019.

FWB [ICCV 2019]

• Standard FSL methods (e.g., shared backbone, masked pooling…) are used during training.

• A ‘relevance’ factor is added and taken into account during cosine similarity computation.

17

FWB [ICCV 2019]

• During inference, ensemble is utilized to select the best set of parameters

• Prototypes are used to predict the support masks reversely, which can be compared to the ground truth.

18Todorovic et al., Feature weighting and boosting for few‐shot segmentation, ICCV 2019.

PANet [ICCV 2019]

• Extracted prototypes are first used to predict query masks, as standard FSL methods do.

• Predicted query masks are used to generate new prototypes and reversely predict support masks

• Similar concept to that of the ‘cycle consistency’ (support→query; query→support)

Kaixin, et al., Panet: Few‐shot image semantic segmentation with prototype alignment, ICCV 2019. 19

PANet [ICCV 2019]

20Kaixin, et al., Panet: Few‐shot image semantic segmentation with prototype alignment, ICCV 2019.

Dataset & Evaluation Metric

• Datasets• PASCAL VOC 2012 (main)

• 20 classes• Split: (15 base + 5 novel)

• coco (secondary)

• Evaluation Metrics• Binary‐mIoU (difficult)• FB‐mIoU (easy)

• Foreground/Background IoU

21

Performance Comparisons

Method Split‐0 Split‐1 Split‐2 Split‐3 MeanReduced‐DFCN8s 39.2 48.0 39.3 34.2 40.2OSLSM BMVC 2017 33.6 55.3 40.9 33.5 40.8co‐FCN ICLRW 2018 36.7 50.6 44.9 32.4 41.2AMP ICCV 2019 41.9 50.2 46.7 34.7 43.4SG‐One 40.2 58.4 48.4 38.4 46.4PANet ICCV 2019 42.3 58.0 51.1 41.2 48.1PRNet 51.6 61.3 53.1 47.6 53.4Co‐att 49.5 65.5 50.0 49.2 53.5CANet CVPR 2019 52.5 65.9 51.3 51.9 55.4PGNet ICCV 2019 56.0 66.9 50.6 50.4 56.0FWB ICCV 2019 51.3 64.5 56.7 52.2 56.2

22






23

Object Detection• Focus on object search: “Where is it?”

• Build templates that quickly differentiate object patch from background patch

Object or Non‐Object?

Dog Model

24

Two‐Stage vs. One‐Stage Object Detection

Methods

Sliding Windows

Two−stage Frameworks

R−CNNFast R−CNNMask R−CNN

⋮

One−stage Frameworks

YOLOYOLOv2YOLOv3

⋮

25

Few‐Shot Object Detection

• What if one cannot collect a sufficient amount of training data for the objects of interest? → Small Data Problem!

• Applications: defect detection, medical image analysis, etc.

26

Support #1 Support #2

Example of 1‐shot object detection

Query Image

Few‐Shot Object Detection with Attention‐RPN & Multi‐Relation Detector [CVPR’20]

• Possible solution: meta‐learning + object detection

• Network architecture (applicable for N‐way K‐shot setting)

• See the following 1‐way 1‐shot object detection for example:

27

Frustratingly Simple Few‐Shot Object Detection [ICML’20]

• Possible solution: object detection + fine tuning + meta‐learning?

• Network architecture

28

Frustratingly Simple Few‐Shot Object Detection [ICML’20]

• Possible solution: object detection + fine tuning or meta‐learning

• Network architecture

29Balanced N‐way K‐shot settings






30

Recap: Domain Adaptation

• Domain‐Adversarial Training of Neural Networks (DANN)• Y. Ganin et al., ICML 2015• Maximize domain confusion = maximize domain classification loss• Minimize source‐domain data classification loss• The derived feature f can be viewed as a disentangled & domain‐invariant feature.

31

Domain Generalization

• Input: Images and labels from multiple source domains

• Output: A well‐generalized model for unseen target domains

32

DS = {Photo, Painting, Cartoon}DT = {Sketch}

Strategy of Episodic Training

• Episodic training for domain generalization (ICCV’19)

• Generalize across domains via Meta‐Learning

Zhang et al. : Episodic training for domain generalization. In ICCV (2019) 33

• Motivation

34

Domain Specific Models

Aggregated ModelEpisodic training

Episodic Training (cont’d)

35

Photo

Cartoon

• Random sample two domains, e.g., Photo and Cartoon


36

Photo Cartoon

• Random sample two domains, e.g., Photo and Cartoon


37


Experiments

• Input: Images and labels from multiple source domains

• Output: A well‐generalized model for unseen target domains

38

DS = {Photo, Painting, Cartoon}DT = {Sketch}

• Domain Generalized Classification

Zhang et al.: Episodic training for domain generalization. In ICCV (2019) 39

Experiments (cont’d)






40

Challenges & Opportunities in Small‐Data Problems

• Imbalanced Data Learning• Some categories with a sufficient # of data, while others are not → Small Data Problem!• E.g., medical image analysis, defect detection, etc.

• Possible Solutions• Reweighting instances, loss functions accordingly• Data augmentation/hallucination• However, augmenting/hallucinating data requires domain knowledge!

41

Challenges & Opportunities in Small‐Data Problems

• Learning with Partial Supervision• No pixel‐level ground truth but only image‐level labels available → Small Data Problem!

e.g., medical image detection or segmentation

• Possible Solutions• Active Learning (human‐in‐the‐loop)• Semi‐Supervised Learning (at least collect few images with pixel‐wise labels)• Weakly Supervised Learning (e.g., multiple instance learning)• Can be guided with auxiliary info (e.g., location or number of objects in an image)

42

Tumor Yes/No?

More Opportunities in Small‐Data Problems

• Self‐Supervised Learning (SSL)• A properly trained network backbone is the King!

(recall examples in transfer learning, domain adaptation or few‐shot learning)• Typically implemented in an unsupervised manner (e.g., via contrastive learning)• Let’s talk about SSL first…

43






44

Supervised Learning

45

• Deep learning plus supervised learning are rocking the world ...

46

• In real world scenarios, data‐annotation is quite time‐consuming

• Could one exploit supervised signals from unlabeled data?

Self‐Supervised Learning (SSL)

47

• Learning discriminative representations from unlabeled data

• Create self‐supervised tasks via data augmentation

Rotation90。 Jigsaw Puzzle

Colorization


48

• Self‐Supervised Pretraining

• Supervised Fine‐tuning


49

• Pretext Tasks• Jigsaw (ECCV’16)• RotNet (ICLR’18)

• Contrastive Learning• CPC (ICML’20)• SimCLR (ICML’20)

• Learning w/o negative samples• BYOL (NeurIPS’20)• Barlow Twins (ICML’21)

Jigsaw Puzzle

50

• Assign the permutation index and perform augmentation

• Solve jigsaw puzzle by predicting the permutation index

Noroozi et al. “Unsupervised learning of visual representations by solving jigsaw puzzles.” ECCV 2016

RotNet

51

• Learning to predict the rotation angle

Gidaris et al. “Unsupervised Representation Learning by Predicting Image Rotations.” ICLR 2018

RotNet

52

• Filters learned with SSL exhibit more variety


53




Contrastive Predictive Coding (CPC)

54Henaff et al. “Data‐efficient image recognition with contrastive predictive coding.“ ICML 2020

• Sample positive patches from itself and negative patches from other images

• Maximize positive similarities and minimize negative ones

positive

positive negative

SimCLR

55Chen et al. "A simple framework for contrastive learning of visual representations." ICML 2020

• Attract augmented images and repel negative samples

• Improve the quality with projection heads (g)…why?

SimCLR

56

• Experiments on semi‐supervised settings


57




Bootstrap Your Own Latent (BYOL)

58

• No need of negative pairs

• Introduce the predictor for asymmetry to avoid collapse

• Exponential Moving Average (EMA)

Grill et al. “Bootstrap your own latent: A new approach to self‐supervised learning.” NeurIPS 2020

BYOL

59

• No need of negative pairs

• Introduce the predictor for asymmetry to avoid collapse

• Exponential Moving Average (EMA)

Grill et al. “Bootstrap your own latent: A new approach to self‐supervised learning.” NeurIPS 2020

Barlow Twins

60

• Enforce diversity among feature dimensions

• Maximize diagonal terms and minimize off‐diagonal ones

• No need of negative pairs, predictor network, gradient stopping or moving average techniques

Zbontar et al. “Barlow twins: Self‐supervised learning via redundancy reduction.” ICML 2021

Barlow Twins

61

• Experiments on classification

Barlow Twins

62

• Experiments on detection and segmentation

What We’ve Covered Today…




• Self‐Supervised Learning (SSL)• Pretext Tasks• Contrastive Learning• Contrastive Learning w/o Negative Pairs

63

Documents

Deep Learning for Computer