Meta-Learning for Low Resource NMT

Meta-Learning for Low

Resource NMT

Introduction

● Historically Statistical Translation

● Neural Machine Translation recently outperforms

● Statistical Models outperformed translations on low resource language pairs

NMT Previous Work

Single Task Mixed Datasets

Monolingual Corpora

Direct Transfer Learning

Meta Learning in NMT

Idea:

Improve on direct transfer learning by better fine-tuning

Danish

FrenchSpanish

Greek

Greek

Polish

Portuguese

Italian

17 High-Resource Languages 4 Low-resource Languages

Turkish

Romanian Latvian

Finnish

MAML for NMT

Danish

FrenchSpanish

Greek

Greek

Polish

Portuguese

Italian

17 High-Resource Languages 4 Low-resource Languages

Turkish

Romanian Latvian

Finnish

Meta-train on these!

Meta-test on these!

e.g. Spanish English

e.g. Turkish English

Note: they simulate low-resource by sub-sampling

Gradient Update

Meta-Gradient Update

Gradient Update

1st-order Approximate Meta-Gradient Update

Meta-Gradient Update

Issue: Meta-train and meta-test input spaces should match!

Meta-train

Meta-test

En un lugar de la mancha, de cuyo nombre no puedo...

In some place in the Mancha, whose name...

Benim adım kırmızı... My name is Red…

Spanish Word Embeddings

Turkish Word Embeddings

Spanish Embedding for nombre

Turkish embedding

for adım

Trained independently

Universal Lexical Representation


English Word Embeddings

French Word Embeddings


Word embeddings trained independently on monolingual corpora



English Word Embeddings

French Word Embeddings


Word embeddings trained independently on monolingual corpora

Universal Embedding Values

Universal Embedding Keys


Universal Embedding Keys(transposed)


nombre


Transformation Matrix

Key: We represent “nombre” as a linearcombination of tokens in the ULR!

And, these are the weights of the linear combination!




adım



Same embedding space as Spanish!

Training



nombre



trainable fixed

Experiments

Comment: Best to leave the

decoder be! Why?

Experiments

Comment: Gap narrows as more

training examples are included

Critique: Don’t evaluate on any

real low-resource languages!

Critique: Don’t know how many

training examples per task? k-shot, but what is k?

Documents

Meta-Learning for Low Resource NMT