Properties of Good Meta-Learning...

Chelsea Finn UC Berkeley & Google Brain -> Stanford

Properties of Good Meta-Learning Algorithms (And How to Achieve Them)

- effectively reuse data on other tasks - replace manual engineering of architecture, hyperparameters, etc. - learn to quickly adapt to unexpected scenarios (inevitable failures,

long tail) - learn how to learn with weak supervision

Why Learn to Learn?

Problem Domains: - few-shot classification & generation - hyperparameter optimization - architecture search - faster reinforcement learning - domain generalization - learning structure - …

Approaches: - recurrent networks - learning optimizers or update rules - learning initial parameters &

architecture - acquiring metric spaces - Bayesian models - …

What is the meta-learning problem statement?

The Meta-Learning Problem

Inputs: Outputs:Supervised Learning:

Meta-Supervised Learning:Inputs: Outputs:

Why is this view useful? Reduces the problem to the design & optimization of f.

Finn, Levine. Meta-learning and Universality: Deep Representation… ICLR 2018

Example: Few-Shot Classification

diagram adapted from Ravi & Larochelle ‘17

Given 1 example of 5 classes: Classify new examples

meta-training

training data test set

training classes

… …

What do we want from our meta-learning algorithms?

Expressive power the ability for f to represent a range of learning procedures

Consistency

Uncertainty awareness

No need to differentiatethrough learning

better scalability to long learning processes

learned learning procedure will solve task with enough dataresult: reasonable performance on out-of-distribution tasks

ability to reason about ambiguity during learning

Design of f ?Recurrent network

(LSTM, NTM, Conv)Santoro et al. ’16, Duan et al. ’17, Wang et al. ’17, Munkhdalai & Yu ’17, Mishra et al. ’17, …

Learned optimizer (often uses recurrence)

Schmidhuber et al. ’87, Bengio et al. ’90, Hochreiter et al. ’01, Li & Malik ’16, Andrychowicz et al. ’16, Ha et al. ’17, Ravi & Larochelle ’17, …

Learned optimizer (often uses recurrence)

Schmidhuber et al. ’87, Bengio et al. ’90, Hochreiter et al. ’01, Li & Malik ’16, Andrychowicz et al. ’16, Ha et al. ’17, Ravi & Larochelle ’17, …

Black box approaches

Expressive power Consistency

Can we incorporate structure into the learning procedure?

Nearest Neighbors(in a learned metric space)

Koch ’15, Vinyals et al. ’16, Snell et al. ’17, Reed et al. ’17, Li et al. ’17, …

Vinyals et al. ’16

Expressive powerConsistency

Approaches that incorporate learning structure

Can we get both?

Koch ’15

Nearest Neighbors(in a learned metric space)

Koch ’15, Vinyals et al. ’16, Snell et al. ’17, Reed et al. ’17, Li et al. ’17, …

Gradient Descent(from learned initialization)

Finn et al. ’17, Grant et al. ’17, Reed et al. ’17, Li et al. ’17, …

Key idea: Train over many tasks, to learn parameter vector θ that transfers via fine-tuning

Fine-tuningtraining data for test task

pretrained parameters

[test-time]

Model-AgnosticMeta-Learning:

(MAML)

Finn, Abbeel Levine. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. ICML 2017

Approaches that incorporate learning structure

Learning procedure:

Expressive powerConsistency

Approaches that incorporate learning structureNearest Neighbors

(in a learned metric space)Koch ’15, Vinyals et al. ’16, Snell et al. ’17, Reed et al. ’17, Li et al. ’17, …

Gradient Descent(from learned initialization)

Finn et al. ’17, Grant et al. ’17, Reed et al. ’17, Li et al. ’17, …

Does consistency come at a cost?

Universal Function Approximation Theorem

Hornik et al. ’89, Cybenko ’89, Funahashi ‘89A neural network with one hidden layer of finite width can approximate any continuous function.

How can we define a notion of expressive power for meta-learning?

“universal function approximator”

Recurrent network Learned optimizer

“universal learning procedure approximator”

For a sufficiently deep f, MAML function can approximate any function of

Finn & Levine, ICLR 2018

Why is this interesting?MAML has benefit of consistency without losing expressive power.

Recurrent network MAML

Assumptions: - nonzero - loss function gradient does not lose information about the label - datapoints in are unique

Black box approaches

Expressive power Consistency

Empirically, what does consistency get you?

How well can methods generalize to similar, but extrapolated tasks?MAML TCML, MetaNetworks

task variability

Omniglot image classification

The world is non-stationary.

MAML TCMLer

Sinusoid curve regression

task variabilityTakeaway: Strategies learned with MAML consistently generalize better to out-of-distribution tasks

How well can methods generalize to similar, but extrapolated tasks?

The world is non-stationary.

Consistency

No need to differentiatethrough learning better scalability to long learning processes

Model-AgnosticMeta-Learning:

(MAML)

What if we stop the gradient through this term?

Note: In some tasks, we have observed a more substantial drop.

Consistency

No uncertainty awareness in original MAML

A Probabilistic Interpretation of MAML

that’s nice but is it… Bayesian?kind of… but it can be more Bayesian

Erin Grant

(Santos, 1996)

Grant, Finn, Levine, Darrell, Griffiths. Recasting gradient-based meta-learning as hierarchical Bayes. ICLR ‘18

Erin Grant

MAP inference in this model

can we do better than MAP?can use Laplace estimate:

estimate Hessian for neural nets with KFAC

A Probabilistic Interpretation of MAML

Grant, Finn, Levine, Darrell, Griffiths. Recasting gradient-based meta-learning as hierarchical Bayes. ICLR ‘18

Modeling ambiguity

Can we sample classifiers?

Intuition: we want to learn a prior where a random kick can put us in different modes

smiling, hatsmiling, young

Meta-learning with ambiguity

Finn*, Xu*, Levine. Probabilistic Model-Agnostic Meta-Learning

Sampling parameter vectors

approximate with MAPthis is extremely crude

but extremely convenient!Finn*, Xu*, Levine. Probabilistic Model-Agnostic Meta-Learning

What does ancestral sampling look like?

smiling, hatsmiling, young

Sampling parameter vectors

PLATIPUSProbabilistic LATent model for Incorporating Priors and Uncertainty in few-Shot learning

Ambiguous regression:

Ambiguous classification:

Probabilistic LATent model for Incorporating Priors and Uncertainty in few-Shot learning

PLATIPUS

Related concurrent work

Kim et al. “Bayesian Model-Agnostic Meta-Learning”: uses Stein variational gradient descent for sampling parameters

Gordon et al. “Decision-Theoretic Meta-Learning”: unifies a number of meta-learning algorithms under a variational inference framework

Consistency

We can build meta-learning algorithms with many nice properties (expressive power, consistency, first-order, uncertainty awareness)

CollaboratorsSergey Levine

Pieter Abbeel

Tianhe Yu Trevor Darrell

Tom Griffiths

Erin Grant

Tianhao Zhang

Josh Abbott

Josh Peterson

Questions?

Blog post, code, and papers: eecs.berkeley.edu/~cbfinn

Annie Xie

Sudeep Dasari

Kelvin Xu

Properties of Good Meta-Learning...

Documents

Manipulation by Feel: Touch-Based Control with Deep ...dineshj/publication/tian-2019-manipula… · mudigonda, cbfinn, sergey.levineg@berkeley.edu 2Facebook AI Research, Menlo Park,

Welcome and Season Overview · Web viewHutchinson Litchfield/Dassel/Cokato MAML (Monticello/Maple Lake) River Lakes Sartell Sauk Rapids St. Cloud STMA (St. Michael/Albertville) Willmar

Meta-Learning for User Cold-Start Recommendation · RS, by applying model-agnostic meta learning (MAML) [11]. MAML is a general and very powerful technique proposed for meta-learning

Some Considerations on Learning to Explore via Meta ... · This objective is similar to the one that appears in MAML [7], which we will discuss further below. In MAML, Uis chosen

Function, function, function - John Orchard

CRAZ MAML Monitoring at Alberta Winter Games

Online Meta-Learning · 2019-07-05 · Online Meta-Learning MAML and other meta-learning algorithms (see Section7) are not directly applicable to sequential settings for two rea-sons

LDEQ Mobile Air Monitoring Lab (MAML) · In addition to equipment on current MAML: –Ammonia Analyzer –GC/FID –GC/Pulsed Flame Photometric Detector –GC/MS for soil and water

AFETY AUTHORITY DASR Form 19 Application for issue ... · PDF fileEXISTING DASR 66 MAML OR EMAR/CASA/EASA PART 66 LICENCE DETAILS ... Module 10 . Aeroskills ... 8.2 Military Aircraft

EUROPEAN MILITARY AIRWORTHINESS REQUIREMENT EMAR 66 … · 2014. 10. 8. · GM 66.A.3 MAML categories.....47 GM 66.A.20(a) Privileges .............................................................................................................47

via Task-Aware Modulation - GitHub Pages• Task network: further gradient adaptation via MAML steps Background Model-Agnostic Meta-Learning [1] • Meta-learn a parameter initialization

Lifelong Few-Shot Learning - Artificial Intelligence › ~cbfinn › _files › icml2017_llworkshop.pdf · Preliminary Investigation into Lifelong Meta-Learning Chelsea Finn, UC Berkeley-given

MetaGAN: An Adversarial Approach to Few-Shot Learningyqsong/papers/2018-NIPS-MetaGAN-long.pdf · models, MAML[Finn et al., 2017] representing models that learn to adapt using gradients,

Sandcastle MAML Guide - Documentation & Help · Sandcastle MAML Guide Welcome Welcome to the Sandcastle MAML Guide. This is intended to be a reference that you can use to find out

Reconciling meta-learning and continual learning with online … · In particular, [21] cast MAML as posterior inference for task-speciﬁc parameters j given some samples of task-speciﬁc

Model-Agnostic Meta-Learning for Fast Adaptation of Deep ...cse.iitrpr.ac.in/ckn/courses/f2019/cs618/maml.pdfCharacteristics of MAML •The MAML learner’s weights are updated using

MAML - An Architecture Description Language for Modeling and … · 2006. 4. 26. · A WPPE consists of a processing unit which contains a set of functional units. Some functional

MAML Mobile Air Monitoring Laboratory - Island Health · MAML – JAMES BAY AIR QUALITY STUDY January 25th, 2010 Data Collection Report 1 | Page 1. Background and Summary of Findings

MAML (Metals Additive Manufacturing Lab) · 2021. 1. 1. · MAML (Metals Additive Manufacturing Lab) – SOP Page 5 of 14 (back to contents) 2. All gowning, cleaning and operation

CSL–MAML-dependent Notch1 signaling controls T lineage