Model-Agnostic Meta-Learning for Fast Adaptation of Deep ...cse.iitrpr.ac.in/ckn/courses/f2019/cs618/maml.pdfCharacteristics of MAML •The MAML learner’s weights are updated using

Aroof Aimen2018CSZ0001,ANN CS618IIT Ropar, Punjab 140001

InstructorDr. CK NarayananCSEIIT Ropar, Punjab 140001

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (MAML)

1

Project Presentation24th October 2019

2

Outline • Introduction to meta-learning

• Prior Work

• Model Agnostic Meta Learning

• Characteristics

• Intuition

• Supervised Learning

• Gradient of Gradient

• Reinforcement Learning

• Experiments

3

Introduction•Definition of meta-learning:

Learner is trained by meta-learner and is able to learn on many tasks

•Goal:

Learner quickly learn new tasks from a small amount of new data.

4

Introduction

5

Introduction

6

Introduction

7

Prior Work•Learning an update function or update rule

• LSTM optimizer (Learning to learn by gradient descent by gradient descent; Optimization as a model for few-shot learning)

•Few shot (or meta) learning for specific tasks• Image classification (Matching Net., Prototypical Net.)

• Reinforcement learning• Benchmarking deep reinforcement learning for continuous

control•Memory-augmented meta-learning

• Meta-learning with memory-augmented neural networks

8

Problem Setup (Definition & Goal)

•Goal• Quickly adapt to new tasks on distribution with only small

amount of data and with only a few gradient steps, even one gradient step.

•Learner• Learn a new task by using a single or very less gradient steps.

•Meta-learner• Learn a generalized parameter initialization of model.

9

Characteristics of MAML

•The MAML learner’s weights are updated using the gradient, rather than a learned update.

•No need additional parameters nor require a particular learner architecture.

•Fast adaptability through good parameter initialization.

• Model-agnostic (No matter whatever model is)Works for classification, regression and reinforcement learning.

10

Intuition of MAML•Desired model parameter set is θ such that: Applying one (or a small number of) gradient step to θ on a new task will produce maximally effective behavior.•Find θ that commonly decreases loss of each task after adaptation.

11

Supervised Learning

12

Gradient of Gradient

13

Reinforcement Learning

14

Experiment on Few-Shot Classification

● Omniglot (Lake et al., 2012)

○ 50 different alphabets, 1623 characters.

○ 20 instances for each characters were drawn by 20 different people. 1200 for training, 423 for test.

● Mini-Imagenet (Ravi & Larochelle, 2017)

○ Classes for each set: train=64, validation=12, test=24.

Results of Few-Shot Classification

15

16

Experiments on Regression•Sinusoid function: Amplitude (A) and phase (ɸ) are varied between tasks

• A in [0.1, 0.5]• ɸ in [0, π]• x in [-5.0, 5.0]

•Loss function: Mean Squared Error (MSE)•Regressor: 2 hidden layers with 40 units and ReLU•Training

• Use only 1 gradient step for learner• K = 5 or 10 example (5-shot learning or 10-shot learning)• Fixed step size (ɑ=0.01) for Adam optimizer.

17

Results of 10-Shot Learning Regression

18

Results of 5-Shot Learning Regression

MAML Needs Only One Gradient Step

19

20

Experiments on Reinforcement Learning

• rllab benchmark suite.• Neural network policy with two hidden layers of size 100 with ReLU.

• Gradients updates are computed using vanilla policy gradient and trust-region policy (TRPO) optimization as meta-optimizer.

Comparison• Pretraining one policy on all of the tasks and fine-tuning• Training a policy from randomly initialized weights• Oracle policy

Results on Reinforcement Learning

21

https://docs.google.com/file/d/0B_j5EZzjlxchdS1ERWhGaURRVlU/preview

https://docs.google.com/file/d/0B_j5EZzjlxchQk1lblh0cmpnd00/preview

Results on Reinforcement Learning

22

• Locomotion• High-dimensional locomotion tasks with the MuJoCo simulator

References

23

• Finn, C., Abbeel, P., & Levine, S. (2017, August). Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70 (pp. 1126-1135). JMLR. org.

• Andrychowicz, M., Denil, M., Gomez, S., Hoffman, M. W., Pfau, D., Schaul, T., ... & De Freitas, N. (2016). Learning to learn by gradient descent by gradient descent. In Advances in neural information processing systems (pp. 3981-3989).

• Ravi, S., & Larochelle, H. (2016). Optimization as a model for few-shot learning.

• Snell, J., Swersky, K., & Zemel, R. (2017). Prototypical networks for few-shot learning. In Advances in Neural Information Processing Systems (pp. 4077-4087).

• Vinyals, O., Blundell, C., Lillicrap, T., & Wierstra, D. (2016). Matching networks for one shot learning. In Advances in neural information processing systems (pp. 3630-3638).

24

Documents

Model-Agnostic Meta-Learning for Fast Adaptation of Deep ...cse.iitrpr.ac.in/ckn/courses/f2019/cs618/maml.pdfCharacteristics of MAML •The MAML learner’s weights are updated using