CS330 Paper Presentation: October 16th,...

CS330 Paper Presentation: October 16th, 2019

Supervised Classification

Semi-Supervised Classification: More realistic datasetLabelled

Unlabelled

Semi-Supervised ClassificationMost “biologically plausible” learning regime

A familiar problem:

Few-shot, multi-task learning:Generalize to unseen classes

A new twist on a familiar problem:

How can we leverage unlabelled data for few-shot classification?

Unlabelled data may come from the support set or not (distractors)

Strategy:As we can now appreciate, there are a number of possible ways to approach the original problem. To name a few:

● Siamese Networks (Koch et al, 2015)● Matching Networks (Vinyals et al., 2016)● Prototypical Networks (Snell et al., 2017)● Weight initialization / Update step learning (Ravi et al., 2017, Finn et al., 2017)● MANN (Santoro et al., 2016)● Temporal convolutions (Mishra et al., 2017)

All are reasonable starting points for semi-supervised few-shot classification problem!

Prototypical Networks (Snell et al., 2017)Very simple inductive bias!

Prototypical Networks (Snell et al., 2017)

For each class, compute prototype

Embedding is generated via a simple convnet:Pixels - 64 [3x3] Filters - Batchnorm - ReLU - [2x2] MaxPool = 64D Vector

https://jasonyzhang.com/convnet/

Softmax distribution of distances to prototypes for newimage

Compute loss

Softmax distribution of distances to prototypes for newimage

Compute loss

Very simple inductive bias:Reduces to a linear model with Euclidean distance

Strategy for semi-supervised:Refine Prototypes centers with unlabelled data.

Support

Unlabelled

1. Start with labelled prototypes2. Give each unlabelled input a partial assignment to

each cluster3. Incorporate unlabelled examples into original

prototype

Strategy for semi-supervised:

Prototypical networks with Soft k-means

Unlabelled support set

Partial Assignment

What about distractor classes?

Prototypical networks with Soft k-means

Add a buffering prototype at the origin to “capture the distractors”

Prototypical networks with Soft k-means w/ Distractor Cluster

Add a buffering prototype at the origin to “capture the distractors”

Prototypical networks with Soft k-means w/ Distractor Cluster

Assumption: Distractors all come from one class!

Soft k-means + Masking Network

1. Distance

2. Compute mask with small network

Soft k-means + Masking Network

differentiable

Soft k-means + Masking

In practice, MLP is a dense layer with 20 hidden units (tanh nonlinearity)

Datasets● Omniglot● miniImageNet (600 images from 100 classes)

Hierarchical Datasets

Omniglot

tieredImageNet

miniImageNet:Test - electric guitarTrain - acoustic guitar

tierediImageNet:Test - musical instrumentsTrain - farming equipment

tieredImagenet

Datasets● Omniglot● miniImageNet (600 images from 100 classes)● tieredImageNet (34 broad categories, each containing 10 to 30 classes)

10% goes to labeled splits 90% goes to unlabelled classes and distractors*

*40/60 for miniImageNet

Datasets● Omniglot● miniImageNet (600 images from 100 classes)● tieredImageNet (34 broad categories, each containing 10 to 30 classes)

10% goes to labeled splits 90% goes to unlabelled classes and distractors*

*40/60 for miniImageNet

Much less labelled data than standard few-shot approaches!!!

DatasetsN: ClassesK: Labelled samples from each classM: Unlabelled samples from N classesH: Distractors (Unlabelled sample from classes other than N)

H=N=5M=5 for training & M=20 for testing

Baseline Models

1. Vanilla Protonet

Baseline Models

1. Vanilla Protonet2. Vanilla Protonet + one step of Soft k-means refinement at

test only (supervised embedding)

Results: Omniglot

Results: miniImageNet

Results: tieredImageNet

Results: Other Baselines

ResultsModels trained with M=5During meta-test: vary amount of unlabelled examples

Results

Conclusions:1. Achieve state of the art performance over logical baselines on 3 datasets

Conclusions:1. Achieve state of the art performance over logical baselines on 3 datasets2. K-means Masked models perform best with distractors

Conclusions:1. Achieve state of the art performance over logical baselines on 3 datasets2. K-means Masked models perform best with distractors3. Novel: models extrapolate to increases in amount of labelled data

Conclusions:1. Achieve state of the art performance over logical baselines on 3 datasets2. K-means Masked models perform best with distractors3. Novel: models extrapolate to increases in amount of labelled data4. New dataset: tieredImageNet

Critiques:1. Results are convincing, but the work is actually a relatively straightforward

application of (a) Protonets and (b) k-means clustering2. Model Choice: protonets are very simple. It’s not clear what they gained by

the simple inductive bias 3. Presented approach does not generalize well beyond classification problems

Future directions: extension to unsupervised learning

I would be really interested in withholding labels alltogetherCan the model learn how many classes there are?

… and correctly classify them?

Future directions: extension to unsupervised learning

Thank you!

Supplemental:Accounting for Intra-Cluster Distance

CS330 Paper Presentation: October 16th,...

Documents

CS 330 Deep Multi-Task and Meta Learning - CS330 Student Presentation …cs330.stanford.edu/presentations/presentation-11.4-4.pdf · 2019-11-05 · CS330 Student Presentation. Motivation

CIS 330: Applied Database Systems - Cornell University › courses › cs330 › 2005fa › slides › ...Aug 26, 2005 · Course Topics • Database design • Relational model •

December 16th

CS330 Paper Presentation: October 16th, 2019 · CS330 Paper Presentation: October 16th, 2019. Supervised Classiﬁcation. Semi-Supervised Classiﬁcation: More realistic dataset Labelled

19-synchronization and lockssprenkle/cs330/slides/19-synchronization_and_locks.pdf•Synchronization: Using atomic operations to ensure cooperation between threads •Mutual Exclusion:

Lifelong Learning - Stanford Universityweb.stanford.edu/.../cs330/slides/cs330_lifelonglearning.pdf · 2019-11-12 · What do you want from your lifelong learning algorithm? minimal

16th ANNUAL

16th Edition

Meta-Learning Recipe, Black-Box Adaptation, Optimization ...web.stanford.edu/class/cs330/slides/cs330_lecture3.pdf · Meta-Learning Recipe, Black-Box Adaptation, Optimization-Based

CS330 Management Information Systems Spring 2015 Ahmed Ataullah University of Waterloo 1

September 16th

October 16th

16th STREET

arXiv:1911.13202v1 [physics.comp-ph] 27 Nov 2019web.stanford.edu/~lexing/scattering.pdfs2S;r2R. The forward problem is to compute d(s;r) given (x). The inverse scattering problem is

16th March

Requirements Elicitation and Documantation CS330 S06

16th Issue

CIS 330: Web-Driven Web Applications - Cornell University › courses › cs330 › 2006fa › slides › ...Aug 25, 2006 · Course Topics • Database design, Normalization •

Emergent Complexity via Multi-agent Competitioncs330.stanford.edu/presentations/presentation-11.4-3.pdf · Emergent Complexity via Multi-agent Competition CS330 Student Presentation

16th Congress