18
Meta-Learning with Implicit Gradients Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine (2019) Mayank Mittal

(2019) Aravind Rajeswaran, Chelsea Finn, Sham …...Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine (2019) Mayank Mittal Model-Agnostic Meta-Learning Images from: Model-Agnostic

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: (2019) Aravind Rajeswaran, Chelsea Finn, Sham …...Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine (2019) Mayank Mittal Model-Agnostic Meta-Learning Images from: Model-Agnostic

Meta-Learning with Implicit GradientsAravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine (2019)

Mayank Mittal

Page 2: (2019) Aravind Rajeswaran, Chelsea Finn, Sham …...Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine (2019) Mayank Mittal Model-Agnostic Meta-Learning Images from: Model-Agnostic

Model-Agnostic Meta-Learning

Images from: Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, Finn et al. (2017)

▪ Learn general purpose internal representation that is transferable across different tasks

distribution over tasks

drawn from task

Meta-Learning objective:

loss function

task-specific parameters

Page 3: (2019) Aravind Rajeswaran, Chelsea Finn, Sham …...Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine (2019) Mayank Mittal Model-Agnostic Meta-Learning Images from: Model-Agnostic

Model-Agnostic Meta-Learning

Images from: Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, Finn et al. (2017)

Inner Level/ Adaptation Step

Outer Level/ Meta-Optimization

Step

Page 4: (2019) Aravind Rajeswaran, Chelsea Finn, Sham …...Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine (2019) Mayank Mittal Model-Agnostic Meta-Learning Images from: Model-Agnostic

Images from: Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, Finn et al. (2017)

Model-Agnostic Meta-Learning

Page 5: (2019) Aravind Rajeswaran, Chelsea Finn, Sham …...Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine (2019) Mayank Mittal Model-Agnostic Meta-Learning Images from: Model-Agnostic

Images from: Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, Finn et al. (2017)

Model-Agnostic Meta-Learning

source of problem

Gradients flow over iterative updates of inner-level! ● Need to store optimization path● Computationally expensive

Hence, heuristics such as first-order MAML and Reptile are used

Page 6: (2019) Aravind Rajeswaran, Chelsea Finn, Sham …...Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine (2019) Mayank Mittal Model-Agnostic Meta-Learning Images from: Model-Agnostic

make this path independent?

Motivation

Page 7: (2019) Aravind Rajeswaran, Chelsea Finn, Sham …...Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine (2019) Mayank Mittal Model-Agnostic Meta-Learning Images from: Model-Agnostic

Implicit MAML (iMAML)

▪ In MAML, is an iterative gradient descent ▪ dependence of model-parameters on

meta-parameters shrinks and vanishes as number of gradient steps increases

▪ Instead, iMAML uses an optimum solution of the regularized loss

encourages dependency between model-parameters

and meta-parameters

Page 8: (2019) Aravind Rajeswaran, Chelsea Finn, Sham …...Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine (2019) Mayank Mittal Model-Agnostic Meta-Learning Images from: Model-Agnostic

Implicit MAML (iMAML)

Issue: Still the meta-gradient requires to computation of the Jacobian, i.e.

Soln: Use implicit Jacobian!

Page 9: (2019) Aravind Rajeswaran, Chelsea Finn, Sham …...Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine (2019) Mayank Mittal Model-Agnostic Meta-Learning Images from: Model-Agnostic

Implicit Jacobian Lemma

where

Proof. Apply stationary point condition:

Page 10: (2019) Aravind Rajeswaran, Chelsea Finn, Sham …...Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine (2019) Mayank Mittal Model-Agnostic Meta-Learning Images from: Model-Agnostic

Practical Algorithm

Issues:▪ obtaining exact solution to inner level optimization

problem may be not feasible, in practice▪ computing Jacobian by explicitly forming and

inverting the matrix may be intractable for large networks

Meta gradient:

Soln: Use approximations!

Page 11: (2019) Aravind Rajeswaran, Chelsea Finn, Sham …...Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine (2019) Mayank Mittal Model-Agnostic Meta-Learning Images from: Model-Agnostic

Practical Algorithm

1. Approximate algorithm solution:

2. Approximate matrix inversion product:

Soln: Use approximations!

Solved using Conjugate Gradient (CG) method

Meta gradient:

Page 12: (2019) Aravind Rajeswaran, Chelsea Finn, Sham …...Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine (2019) Mayank Mittal Model-Agnostic Meta-Learning Images from: Model-Agnostic

Practical AlgorithmImplicit gradient accuracy: (valid under other regularity assumptions)

1. Approximate algorithm solution:

2. Approximate matrix inversion product:

Soln: Use approximations!

Solved using Conjugate Gradient (CG) method

Page 13: (2019) Aravind Rajeswaran, Chelsea Finn, Sham …...Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine (2019) Mayank Mittal Model-Agnostic Meta-Learning Images from: Model-Agnostic

Implicit MAML Algorithm

Inner Level/ Adaptation

Step

Outer Level/ Meta-Optimiza

tion Step

Page 14: (2019) Aravind Rajeswaran, Chelsea Finn, Sham …...Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine (2019) Mayank Mittal Model-Agnostic Meta-Learning Images from: Model-Agnostic

Theory: iMAML computes meta-gradient by small error in memory efficient way

Experiment:

Computation Bounds

Page 15: (2019) Aravind Rajeswaran, Chelsea Finn, Sham …...Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine (2019) Mayank Mittal Model-Agnostic Meta-Learning Images from: Model-Agnostic

Results on few-shots learning

Page 16: (2019) Aravind Rajeswaran, Chelsea Finn, Sham …...Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine (2019) Mayank Mittal Model-Agnostic Meta-Learning Images from: Model-Agnostic

Future Work

▪ Broader classes for inner loop optimization▪ dynamic programming▪ energy-based models▪ actor-critic RL

▪ More flexible regularizers▪ Learn a vector- or matrix-valued λ

(regularization coefficient)

Page 17: (2019) Aravind Rajeswaran, Chelsea Finn, Sham …...Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine (2019) Mayank Mittal Model-Agnostic Meta-Learning Images from: Model-Agnostic

References

▪ Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, Finn et al. (2017)

▪ Meta-Learning with Implicit Gradients, Rajeswaran et al. (2019)

▪ Notes on iMAML, Ferenc Huszar (2019)

Page 18: (2019) Aravind Rajeswaran, Chelsea Finn, Sham …...Aravind Rajeswaran, Chelsea Finn, Sham Kakade, Sergey Levine (2019) Mayank Mittal Model-Agnostic Meta-Learning Images from: Model-Agnostic

Thank you!