Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Few-Shot and Zero-Shot Learning
Xiaolong Wang
This Class
• Few-shot learning
• Meta-learning for few-shot learning
• Zero-shot learning
Few-shot Learning
The Problem
• Humans can learn a novel concept from a few samples
• Goal: let machine learning algorithms learn from a few samples
(Saiga Antelope)
❌ ✔
Introduction
• Issue: learning with insufficient data causes overfitting
Introduction
• Intuition: humans can learn quickly, as they have a lot of relevant experience
… …
Introduction
• Solution: transfer learning
• Base classes: classes with sufficient samples (training set)
• Novel classes: classes with a few samples… …
N-way K-shot Task
• N novel classes• Support-set: N×K images (training set)• Query-set: images to classify, typically N×Q
• Common evaluation protocol• Sampled from dataset to evaluate
• “task” / “FSL task” denotes this task by default…
Main Types of FSL Algorithms
• Transferring standard classification model• Nearest neighbor/centroid• Fine-tuning
• Meta-learning• Metric-based• Optimization-based
Transferring Standard Classification Model
Baseline #1: Nearest centroid
1. Train a classifier for all base classes (1st stage)
Baseline #1: Nearest centroid
1. Train a classifier for all base classes (1st stage)
2. Remove the last FC layer and get a feature encoder
Baseline #1: Nearest centroid
1. Train a classifier for all base classes (1st stage)
2. Remove the last FC layer and get a feature encoder
3. In a FSL task: compute the mean feature for each class in support set, classify query set by nearest neighbor
Baseline #1: Nearest centroid
• mean feature: “prototype” of a class
• Can be also viewed as estimated weights of a FC layer
• Distance: square Euclidean / cosine similarity
Improving #1: Cosine Classifier
Use cosine similarity for both:• Training (1st stage): replace the last FC layer
• Inference: cosine distance to prototypes
scaler 𝜏 is a learnable parameter
Gidaris et al. Dynamic Few-Shot Visual Learning without Forgetting. CVPR 2018
Improving #1: Cosine Classifier
Gidaris et al. Dynamic Few-Shot Visual Learning without Forgetting. CVPR 2018
Improving #1: Cosine Classifier
(ablation study on validation set with generalized FSL setting, focus Novel only)
Baseline #2: Fine-tuning
1. Train a classifier for all base classes (1st stage)2. In a FSL task: Fine-tune with support set
Fine-tuning may cause overfittingOption: Fine-tuning the last FC layer
Improving #2: “Baseline++”
• Use cosine-classifier for both the 1st stage and fine-tuning
Chen et al. A Closer Look at Few-shot Classification. ICLR 2019
Improving #2: “Baseline++”
How to get a good representation for FSL?Idea: let the learning objective describe our goal
—— directly optimize towards the FSL tasks
Meta-Learning for FSL
• Learning the model by optimizing towards FSL tasks sampled from images in training set (base classes)
Meta-Learning for FSL• A Task: N-way K-shot (and Q-query). N*(K+Q) images1. Sample a task (support-set + query-set) from base classes2. A model processes the support-set, then classifies samples in query-set3. Compute the loss of query-set classification (using ground-truths), optimize towards the loss
http://web.stanford.edu/class/cs330/
Meta-Learning for FSL1. Sample a task (support-set + query-set) from base classes2. A model processes the support-set, then classifies samples in query-set3. Compute the loss of query-set classification (using ground-truths), optimize towards the loss
The key is Step 2: a differentiable algorithm
• A differentiable algorithm↔ A meta-learning method
Meta-Learning for FSL1. Sample a task (support-set + query-set) from base classes2. A model processes the support-set, then classifies samples in query-set3. Compute the loss of query-set classification (using ground-truths), optimize towards the loss
• Metric-based: Get features of the support-set, classify the query-set by feature comparison
• Optimization-based: the model optimizes towards the support-set for a few steps, then classifies the query-set
Metric-Based Meta-Learning
Matching Network
• Get features of support / query images• Classify query images by nearest-neighbor (cosine distance)
Vinyals et al. Matching Networks for One Shot Learning. NeurIPS 2016
Prototypical Network
• Get mean features for each class in support set• Classify query images to nearest class center
Snell et al. Prototypical Networks for Few-shot Learning. NeurIPS 2017
Prototypical Network
Simplifies Matching Network.
Differences:1. Merge class features by averaging, instead of 1-to-1 matching
2. Squared Euclidean distance
Relation Network
• Relation net g: learnable comparing module
Sung et al. Learning to Compare: Relation Network for Few-Shot Learning. CVPR 2018
Results
Metric-based meta-learning:
Learning how to do matching.
Optimization-Based Meta-Learning
MAML
• Learn an initialization θ that works well for per-task fine-tuning
Finn et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. ICML 2017
MAML
• The computation of the fine-tuning process is differentiable
• θ ← θ − 𝛽 𝛻!𝐿(𝜃 − 𝛼𝛻! 𝐿 θ, 𝑆 , 𝑄)• 𝑆 : support-set• 𝑄 : query-set• 2nd order gradient
Finn et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. ICML 2017
MAML
• Works only for small networks
• Very task dependent, performance can vary a lot depending on tasks
• Can perform worse than simple fine-tuning on larger dataset and networks
Summary with Few-Shot learning
• Few-Shot learning is an important problem
• Meta-Learning makes the form of training/testing consistent
• Challenges• Scalability of the meta-learning algorithms• More practical settings: generalized FSL, any-shot / higher-shot• Discrepancy between base classes and novel classes…
Zero-Shot Learning
Word2vec Embeddings
Mikolov et al. Distributed Representations of Words and Phrases and their Compositionality. 2013
Skip-gram model
Word2vec Embeddings
DeViSE
Frome et al. DeViSE: A Deep Visual-Semantic Embedding Model
DeViSE
• Use the implicit relation between words with word embeddings
• How about using explicit relation in a knowledge graph?
Using Knowledge Graphs• Never Ending Language Learning (NELL) Knowledge Graph
https://rtw.ml.cmu.edu
Zero-Shot Recognition
• Word Embedding + Knowledge Graph• Graph Convolutional Network (GCN)• Training class: 𝑥!, 𝑥" ; Testing class: 𝑥#
300-d 2048-d
Zero-Shot Recognition
300-d 2048-d
Zero-Shot Recognition
2.5K classes
8.8K classes
21K classes
This Class
• Few-shot learning
• Meta-learning for few-shot learning
• Zero-shot learning