126
Mimicking Word Embeddings using Subword RNNs Yuval Pinter, Robert Guthrie, Jacob Eisenstein @yuvalpi Presented at EMNLP September 2017, Copenhagen

Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Mimicking Word Embeddings using Subword RNNsYuval Pinter, Robert Guthrie, Jacob Eisenstein

@yuvalpi

Presented at EMNLP

September 2017, Copenhagen

Page 2: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

The Word Embedding Pipeline

Unlabeled

corpusUnlabeled

corpusUnlabeled

corpus

Unlabeled

corpus

Wikipedia

GigaWord

Reddit

...2

Page 3: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Embedding

model

(vectors)

W2V

GloVe

Polyglot

FastText

...

The Word Embedding Pipeline

Unlabeled

corpusUnlabeled

corpusUnlabeled

corpus

Unlabeled

corpus

Wikipedia

GigaWord

Reddit

...2

Page 4: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Supervised

task corpusSupervised

task corpus

Penn TreeBank

SemEval

OntoNotes

Univ. Dependencies

...

Embedding

model

(vectors)

W2V

GloVe

Polyglot

FastText

...

The Word Embedding Pipeline

Unlabeled

corpusUnlabeled

corpusUnlabeled

corpus

Unlabeled

corpus

Wikipedia

GigaWord

Reddit

...2

Page 5: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Supervised

task corpusSupervised

task corpus

Penn TreeBank

SemEval

OntoNotes

Univ. Dependencies

...

Embedding

model

(vectors)

W2V

GloVe

Polyglot

FastText

...

The Word Embedding Pipeline

Unlabeled

corpusUnlabeled

corpusUnlabeled

corpus

Unlabeled

corpus

Wikipedia

GigaWord

Reddit

...

Tagging

Parsing

Sentiment

NER

...

Task

model

2

Page 6: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

All possible text

Unlabeled text

Assumed Pattern

Supervised

task text

3

Page 7: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

All possible text

Unlabeled text

Actual Pattern

Supervised

task text

4

Page 8: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

All possible text

Unlabeled text

Actual Pattern

Supervised

task text

5

Page 9: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

All possible text

Unlabeled text

Actual Pattern

Supervised

task text

● No pre-trained vectors

5

Page 10: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

All possible text

Unlabeled text

Actual Pattern

Supervised

task text

● No pre-trained vectors

● Affects supervised tasks

5

Page 11: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

All possible text

Unlabeled text

Actual Pattern

Supervised

task text

● No pre-trained vectors

● Affects supervised tasks

● Multiple treatments

suggested

5

Page 12: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

All possible text

Unlabeled text

Actual Pattern

Supervised

task text

● No pre-trained vectors

● Affects supervised tasks

● Multiple treatments

suggested

● Our method - compositional

subword OOV model 5

Page 13: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Sources of OOVs

6

Page 14: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Sources of OOVs● Names Chalabi has increasingly marginalized within Iraq, ...

6

Page 15: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Sources of OOVs● Names

● Domain-specific jargon

Chalabi has increasingly marginalized within Iraq, ...

Important species (...) include shrimp, (...) and some varieties of flatfish.

6

Page 16: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Sources of OOVs● Names

● Domain-specific jargon

● Foreign words

Chalabi has increasingly marginalized within Iraq, ...

Important species (...) include shrimp, (...) and some varieties of flatfish.

This term was first used in German (Hochrenaissance), …

6

Page 17: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Sources of OOVs● Names

● Domain-specific jargon

● Foreign words

● Rare morphological derivations

Chalabi has increasingly marginalized within Iraq, ...

Important species (...) include shrimp, (...) and some varieties of flatfish.

This term was first used in German (Hochrenaissance), …

Without George Martin the Beatles would have been just another

untalented band as Oasis.

6

Page 18: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Sources of OOVs● Names

● Domain-specific jargon

● Foreign words

● Rare morphological derivations

● Nonce words

Chalabi has increasingly marginalized within Iraq, ...

Important species (...) include shrimp, (...) and some varieties of flatfish.

This term was first used in German (Hochrenaissance), …

Without George Martin the Beatles would have been just another

untalented band as Oasis.

What if Google morphed into GoogleOS?

6

Page 19: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Sources of OOVs● Names

● Domain-specific jargon

● Foreign words

● Rare morphological derivations

● Nonce words

● Nonstandard orthography

Chalabi has increasingly marginalized within Iraq, ...

Important species (...) include shrimp, (...) and some varieties of flatfish.

This term was first used in German (Hochrenaissance), …

Without George Martin the Beatles would have been just another

untalented band as Oasis.

What if Google morphed into GoogleOS?

We’ll have four bands, and Big D is cookin’. lots of fun and great prizes.

6

Page 20: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Sources of OOVs● Names

● Domain-specific jargon

● Foreign words

● Rare morphological derivations

● Nonce words

● Nonstandard orthography

● Typos and other errors

Chalabi has increasingly marginalized within Iraq, ...

Important species (...) include shrimp, (...) and some varieties of flatfish.

This term was first used in German (Hochrenaissance), …

Without George Martin the Beatles would have been just another

untalented band as Oasis.

What if Google morphed into GoogleOS?

We’ll have four bands, and Big D is cookin’. lots of fun and great prizes.

I dislike this urban society and I want to leave this whole enviroment.

6

Page 21: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Sources of OOVs● Names

● Domain-specific jargon

● Foreign words

● Rare morphological derivations

● Nonce words

● Nonstandard orthography

● Typos and other errors

● …

Chalabi has increasingly marginalized within Iraq, ...

Important species (...) include shrimp, (...) and some varieties of flatfish.

This term was first used in German (Hochrenaissance), …

Without George Martin the Beatles would have been just another

untalented band as Oasis.

What if Google morphed into GoogleOS?

We’ll have four bands, and Big D is cookin’. lots of fun and great prizes.

I dislike this urban society and I want to leave this whole enviroment.

???

6

Page 22: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Supervised

task corpusUnlabeled

corpusUnlabeled

corpusUnlabeled

corpus

Unlabeled

corpus

Common OOV handling techniques● None (random init)

OOV

7

Page 23: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Supervised

task corpusUnlabeled

corpusUnlabeled

corpusUnlabeled

corpus

Unlabeled

corpus

Common OOV handling techniques● None (random init)

OOV

???

7

Page 24: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Supervised

task corpusUnlabeled

corpusUnlabeled

corpusUnlabeled

corpus

Unlabeled

corpus

Common OOV handling techniques● None (random init)

● One UNK to rule them all○ Average existing embeddings

○ Trained with embeddings (stochastic unking)

OOV

UNK

8

Page 25: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Supervised

task corpusUnlabeled

corpusUnlabeled

corpusUnlabeled

corpus

Unlabeled

corpus

Common OOV handling techniques● None (random init)

● One UNK to rule them all○ Average existing embeddings

○ Trained with embeddings (stochastic unking)

OOV

UNK

8

Page 26: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Supervised

task corpusUnlabeled

corpusUnlabeled

corpusUnlabeled

corpus

Unlabeled

corpus

Common OOV handling techniques● None (random init)

● One UNK to rule them all○ Average existing embeddings

○ Trained with embeddings (stochastic unking)

OOV

UNK

8

Page 27: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Supervised

task corpusUnlabeled

corpusUnlabeled

corpusUnlabeled

corpus

Unlabeled

corpus

Common OOV handling techniques● None (random init)

● One UNK to rule them all○ Average existing embeddings

○ Trained with embeddings (stochastic unking)

● Add subword model during WE training○ Bhatia et al. (2016), Wieting et al. (2016)

OOV

9

Page 28: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Supervised

task corpus

Common OOV handling techniques● None (random init)

● One UNK to rule them all○ Average existing embeddings

○ Trained with embeddings (stochastic unking)

● Add subword model during WE training○ Bhatia et al. (2016), Wieting et al. (2016)

○ What if we don’t have access to the original corpus? (e.g. FastText)

OOV

9

Page 29: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Char2Tag

OOV

10

Supervised

task corpusUnlabeled

corpusUnlabeled

corpusUnlabeled

corpus

Unlabeled

corpus

Page 30: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Char2Tag● Add subword layer to supervised task

○ Ling et al. (2015), Plank et al. (2016)

OOV

10

Supervised

task corpusUnlabeled

corpusUnlabeled

corpusUnlabeled

corpus

Unlabeled

corpus

Page 31: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Char2Tag● Add subword layer to supervised task

○ Ling et al. (2015), Plank et al. (2016)

● OOVs benefit from co-trained character model

OOV

10

Supervised

task corpusUnlabeled

corpusUnlabeled

corpusUnlabeled

corpus

Unlabeled

corpus

Page 32: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Char2Tag● Add subword layer to supervised task

○ Ling et al. (2015), Plank et al. (2016)

● OOVs benefit from co-trained character model

● Requires large supervised training set forefficient transfer to test set OOVs

OOV

10

Supervised

task corpusUnlabeled

corpusUnlabeled

corpusUnlabeled

corpus

Unlabeled

corpus

Page 33: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Supervised

task corpusUnlabeled

corpusUnlabeled

corpusUnlabeled

corpus

Unlabeled

corpus

OOV

Enter MIMICK

OOV

11

Page 34: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

● What data do we have, post-unlabeled corpus?○ Vector dictionary

○ Orthography (the way words are spelled)

Supervised

task corpusUnlabeled

corpusUnlabeled

corpusUnlabeled

corpus

Unlabeled

corpus

OOV

Enter MIMICK

OOV

11

Page 35: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

● What data do we have, post-unlabeled corpus?○ Vector dictionary

○ Orthography (the way words are spelled)

Supervised

task corpusUnlabeled

corpusUnlabeled

corpusUnlabeled

corpus

Unlabeled

corpus

OOV

Enter MIMICK

OOV

11

Page 36: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

● What data do we have, post-unlabeled corpus?○ Vector dictionary

○ Orthography (the way words are spelled)

● Use the former as training objective, latter as input

Supervised

task corpusUnlabeled

corpusUnlabeled

corpusUnlabeled

corpus

Unlabeled

corpus

OOV

Enter MIMICK

OOV

11

Page 37: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

● What data do we have, post-unlabeled corpus?○ Vector dictionary

○ Orthography (the way words are spelled)

● Use the former as training objective, latter as input

● Pre-trained vectors as target○ No need to access original unlabeled corpus

○ Many training examples

○ (No context)Supervised

task corpusUnlabeled

corpusUnlabeled

corpusUnlabeled

corpus

Unlabeled

corpus

OOV

Enter MIMICK

OOV

11

Page 38: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

● What data do we have, post-unlabeled corpus?○ Vector dictionary

○ Orthography (the way words are spelled)

● Use the former as training objective, latter as input

● Pre-trained vectors as target○ No need to access original unlabeled corpus

○ Many training examples

○ (No context)

● Subword units as inputs○ Very extensible

○ (Character inventory changes?)

Supervised

task corpusUnlabeled

corpusUnlabeled

corpusUnlabeled

corpus

Unlabeled

corpus

OOV

Enter MIMICK

OOV

11

Page 39: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

MIMICK Training

m eka

Pre-trained Embedding(Polyglot/FastText/etc.) make

All possible text

Unlabeled text

make

12

Page 40: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Characterembeddings

MIMICK Training

m eka

Pre-trained Embedding(Polyglot/FastText/etc.) make

All possible text

Unlabeled text

make

12

Page 41: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Characterembeddings

MIMICK Training

m eka

Pre-trained Embedding(Polyglot/FastText/etc.) make

All possible text

Unlabeled text

make

Forward LSTM

12

Page 42: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Characterembeddings

MIMICK Training

m eka

Pre-trained Embedding(Polyglot/FastText/etc.) make

All possible text

Unlabeled text

makeBackward

LSTM

Forward LSTM

12

Page 43: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Characterembeddings

MIMICK Training

m eka

Pre-trained Embedding(Polyglot/FastText/etc.) make

All possible text

Unlabeled text

makeBackward

LSTM

Forward LSTM

Multilayered Perceptron

12

Page 44: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Characterembeddings

MIMICK Training

m eka

Pre-trained Embedding(Polyglot/FastText/etc.) make

Loss (L2)

Mimicked EmbeddingAll possible text

Unlabeled text

makeBackward

LSTM

Forward LSTM

Multilayered Perceptron

12

Page 45: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Characterembeddings

MIMICK Inference

b hal

Mimicked EmbeddingAll possible text

Unlabeled textblah

Backward LSTM

Forward LSTM

Multilayered Perceptron

13

Page 46: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Observation – Nearest Neighbors

14

Page 47: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Observation – Nearest Neighbors● English (OOV Nearest in-vocab words)

14

Page 48: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Observation – Nearest Neighbors● English (OOV Nearest in-vocab words)

○ MCT → AWS, OTA, APT, PDM

14

Page 49: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Observation – Nearest Neighbors● English (OOV Nearest in-vocab words)

○ MCT → AWS, OTA, APT, PDM

○ pesky → euphoric, disagreeable, horrid, ghastly

14

Page 50: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Observation – Nearest Neighbors● English (OOV Nearest in-vocab words)

○ MCT → AWS, OTA, APT, PDM

○ pesky → euphoric, disagreeable, horrid, ghastly

○ lawnmower → tradesman, bookmaker, postman, hairdresser

14

Page 51: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Observation – Nearest Neighbors● English (OOV Nearest in-vocab words)

○ MCT → AWS, OTA, APT, PDM

○ pesky → euphoric, disagreeable, horrid, ghastly

○ lawnmower → tradesman, bookmaker, postman, hairdresser

● Hebrew

14

Page 52: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Observation – Nearest Neighbors● English (OOV Nearest in-vocab words)

○ MCT → AWS, OTA, APT, PDM

○ pesky → euphoric, disagreeable, horrid, ghastly

○ lawnmower → tradesman, bookmaker, postman, hairdresser

● Hebrew○ תפתור תתג → (she/you-3p.sg.) will come true (she/you-3p.sg.) will solve

14

Page 53: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Observation – Nearest Neighbors● English (OOV Nearest in-vocab words)

○ MCT → AWS, OTA, APT, PDM

○ pesky → euphoric, disagreeable, horrid, ghastly

○ lawnmower → tradesman, bookmaker, postman, hairdresser

● Hebrew○ תפתור תתג → (she/you-3p.sg.) will come true (she/you-3p.sg.) will solve

○ טריים גיאומטריים → geometric (m.pl., nontrad. spelling) geometric (m.pl.)

14

Page 54: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Observation – Nearest Neighbors● English (OOV Nearest in-vocab words)

○ MCT → AWS, OTA, APT, PDM

○ pesky → euphoric, disagreeable, horrid, ghastly

○ lawnmower → tradesman, bookmaker, postman, hairdresser

● Hebrew○ תפתור תתג → (she/you-3p.sg.) will come true (she/you-3p.sg.) will solve

○ טריים גיאומטריים → geometric (m.pl., nontrad. spelling) geometric (m.pl.)

○ ארך רדסון’ריצ → Richardson Eustrach

14

Page 55: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Observation – Nearest Neighbors● English (OOV Nearest in-vocab words)

○ MCT → AWS, OTA, APT, PDM

○ pesky → euphoric, disagreeable, horrid, ghastly

○ lawnmower → tradesman, bookmaker, postman, hairdresser

● Hebrew○ תפתור תתג → (she/you-3p.sg.) will come true (she/you-3p.sg.) will solve

○ טריים גיאומטריים → geometric (m.pl., nontrad. spelling) geometric (m.pl.)

○ ארך רדסון’ריצ → Richardson Eustrach

● ✔ Surface form ✔ Syntactic properties ✘ Semantics

14

Page 56: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Intrinsic Evaluation – RareWords

15

Page 57: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Intrinsic Evaluation – RareWords

● RareWords similarity task: morphologically-complex, mostly unseen words

15

Page 58: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Intrinsic Evaluation – RareWords

● RareWords similarity task: morphologically-complex, mostly unseen words

15

Page 59: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Intrinsic Evaluation – RareWords

● RareWords similarity task: morphologically-complex, mostly unseen words

● Names

● Domain-specific jargon

● Foreign words

● Rare(-ish) morphological derivations

● Nonce words

● Nonstandard orthography

● Typos and other errors

● ...15

Page 60: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Intrinsic Evaluation – RareWords

● RareWords similarity task: morphologically-complex, mostly unseen words

● Names

● Domain-specific jargon

● Foreign words

● Rare(-ish) morphological derivations

● Nonce words

● Nonstandard orthography

● Typos and other errors

● ...

Nearest:

programmatic

transformational

mechanistic

transactional

contextual

NN FUN!!!

15

Page 61: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Extrinsic Evaluation – POS + Attribute Tagging

● Names

● Domain-specific jargon

● Foreign words

● Rare(-ish) morphological derivations

● Nonce words

● Nonstandard orthography

● Typos and other errors

● ... 16

● UD is annotated for POS and morphosyntactic attributes○ Eng: his stated goals Tense=Past|VerbForm=Part

○ Cze: osoby v pokročilém věku Animacy=Inan|Case=Loc|Degree=Pos|Gender=Masc|Negative=Pos|Number=Sing

people of advanced age

Page 62: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Extrinsic Evaluation – POS + Attribute Tagging

● Names

● Domain-specific jargon

● Foreign words

● Rare(-ish) morphological derivations

● Nonce words

● Nonstandard orthography

● Typos and other errors

● ... 16

● UD is annotated for POS and morphosyntactic attributes○ Eng: his stated goals Tense=Past|VerbForm=Part

○ Cze: osoby v pokročilém věku Animacy=Inan|Case=Loc|Degree=Pos|Gender=Masc|Negative=Pos|Number=Sing

people of advanced age

● POS model from Ling et al. (2015)

VBZ VBGNNDT

Wordembeddings

the sittingiscat

Forward LSTM

Backward LSTM

Page 63: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

● UD is annotated for POS and morphosyntactic attributes○ Eng: his stated goals Tense=Past|VerbForm=Part

○ Cze: osoby v pokročilém věku Animacy=Inan|Case=Loc|Degree=Pos|Gender=Masc|Negative=Pos|Number=Sing

people of advanced age

● POS model from Ling et al. (2015)

● Attributes - same as POS layer

pres ------

-- --sing--

VBZ VBGNNDT

Wordembeddings

the sittingiscat

Forward LSTM

Backward LSTM

POS

Number

Tense

17

Extrinsic Evaluation – POS + Attribute Tagging

Page 64: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

● UD is annotated for POS and morphosyntactic attributes○ Eng: his stated goals Tense=Past|VerbForm=Part

○ Cze: osoby v pokročilém věku Animacy=Inan|Case=Loc|Degree=Pos|Gender=Masc|Negative=Pos|Number=Sing

people of advanced age

● POS model from Ling et al. (2015)

● Attributes - same as POS layer

pres ------

-- --sing--

VBZ VBGNNDT

Wordembeddings

the sittingiscat

Forward LSTM

Backward LSTM

POS

Number

Tense

17

● Negative effect on POS

Extrinsic Evaluation – POS + Attribute Tagging

Page 65: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

● UD is annotated for POS and morphosyntactic attributes○ Eng: his stated goals Tense=Past|VerbForm=Part

○ Cze: osoby v pokročilém věku Animacy=Inan|Case=Loc|Degree=Pos|Gender=Masc|Negative=Pos|Number=Sing

people of advanced age

● POS model from Ling et al. (2015)

● Attributes - same as POS layer

pres ------

-- --sing--

VBZ VBGNNDT

Wordembeddings

the sittingiscat

Forward LSTM

Backward LSTM

POS

Number

Tense

17

● Negative effect on POS

● Attribute evaluation metric○ Micro F1

Extrinsic Evaluation – POS + Attribute Tagging

Page 66: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Language Selection

Minna Sundberg

18

Page 67: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Language Selection● |UD ∩ Polyglot| = 44, we took 23

Minna Sundberg

18

Page 68: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Language Selection● |UD ∩ Polyglot| = 44, we took 23

● Morphological structure

Minna Sundberg

18

Page 69: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Language Selection● |UD ∩ Polyglot| = 44, we took 23

● Morphological structure○ 12 fusional

Minna Sundberg

18

Page 70: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Language Selection● |UD ∩ Polyglot| = 44, we took 23

● Morphological structure○ 12 fusional

○ 3 analytic

Minna Sundberg

18

Page 71: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Language Selection● |UD ∩ Polyglot| = 44, we took 23

● Morphological structure○ 12 fusional

○ 3 analytic

○ 1 isolating

Minna Sundberg

18

Page 72: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Language Selection● |UD ∩ Polyglot| = 44, we took 23

● Morphological structure○ 12 fusional

○ 3 analytic

○ 1 isolating

○ 7 agglutinative

Minna Sundberg

18

Page 73: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Language Selection● |UD ∩ Polyglot| = 44, we took 23

● Morphological structure○ 12 fusional

○ 3 analytic

○ 1 isolating

○ 7 agglutinative

● Geneological diversity

Minna Sundberg

18

Page 74: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Language Selection● |UD ∩ Polyglot| = 44, we took 23

● Morphological structure○ 12 fusional

○ 3 analytic

○ 1 isolating

○ 7 agglutinative

● Geneological diversity○ 13 Indo-European (7 different branches)

Minna Sundberg

18

Page 75: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Language Selection● |UD ∩ Polyglot| = 44, we took 23

● Morphological structure○ 12 fusional

○ 3 analytic

○ 1 isolating

○ 7 agglutinative

● Geneological diversity○ 13 Indo-European (7 different branches)

○ 10 from 8 non-IE branches

Minna Sundberg

18

Page 76: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Language Selection● |UD ∩ Polyglot| = 44, we took 23

● Morphological structure○ 12 fusional

○ 3 analytic

○ 1 isolating

○ 7 agglutinative

● Geneological diversity○ 13 Indo-European (7 different branches)

○ 10 from 8 non-IE branches

● MRLs (e.g. Slavic languages)

Minna Sundberg

18

Page 77: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Language Selection● |UD ∩ Polyglot| = 44, we took 23

● Morphological structure○ 12 fusional

○ 3 analytic

○ 1 isolating

○ 7 agglutinative

● Geneological diversity○ 13 Indo-European (7 different branches)

○ 10 from 8 non-IE branches

● MRLs (e.g. Slavic languages)○ Much word-level data

Minna Sundberg

18

Page 78: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Language Selection● |UD ∩ Polyglot| = 44, we took 23

● Morphological structure○ 12 fusional

○ 3 analytic

○ 1 isolating

○ 7 agglutinative

● Geneological diversity○ 13 Indo-European (7 different branches)

○ 10 from 8 non-IE branches

● MRLs (e.g. Slavic languages)○ Much word-level data

○ Relatively free word order Minna Sundberg

18

Page 79: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Language Selection● |UD ∩ Polyglot| = 44, we took 23

● Morphological structure○ 12 fusional

○ 3 analytic

○ 1 isolating

○ 7 agglutinative

● Geneological diversity○ 13 Indo-European (7 different branches)

○ 10 from 8 non-IE branches

● MRLs (e.g. Slavic languages)○ Much word-level data

○ Relatively free word order

Institutional

Entrepreneurial

Linguistic

Anatomical

Ideological

Minna Sundberg

18

Page 80: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Language Selection (contd.)

19

Page 81: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Language Selection (contd.)● Script type

19

Page 82: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Language Selection (contd.)● Script type

○ 7 in non-alphabetic scripts

19

Page 83: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Language Selection (contd.)● Script type

○ 7 in non-alphabetic scripts

○ Ideographic (Chinese) - ~12K characters

19

Page 84: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Language Selection (contd.)● Script type

○ 7 in non-alphabetic scripts

○ Ideographic (Chinese) - ~12K characters

○ Hebrew, Arabic - no casing, no vowels, syntactic fusion

19

Page 85: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Language Selection (contd.)● Script type

○ 7 in non-alphabetic scripts

○ Ideographic (Chinese) - ~12K characters

○ Hebrew, Arabic - no casing, no vowels, syntactic fusion

○ Vietnamese - tokens are non-compositional syllables

19

Page 86: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Language Selection (contd.)● Script type

○ 7 in non-alphabetic scripts

○ Ideographic (Chinese) - ~12K characters

○ Hebrew, Arabic - no casing, no vowels, syntactic fusion

○ Vietnamese - tokens are non-compositional syllables

● Attribute-carrying tokens

19

Page 87: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Language Selection (contd.)● Script type

○ 7 in non-alphabetic scripts

○ Ideographic (Chinese) - ~12K characters

○ Hebrew, Arabic - no casing, no vowels, syntactic fusion

○ Vietnamese - tokens are non-compositional syllables

● Attribute-carrying tokens○ Range from 0% (Vietnamese) to 92.4% (Hindi)

19

Page 88: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Language Selection (contd.)● Script type

○ 7 in non-alphabetic scripts

○ Ideographic (Chinese) - ~12K characters

○ Hebrew, Arabic - no casing, no vowels, syntactic fusion

○ Vietnamese - tokens are non-compositional syllables

● Attribute-carrying tokens○ Range from 0% (Vietnamese) to 92.4% (Hindi)

● OOV rate (UD against Polyglot vocabulary)

19

Page 89: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Language Selection (contd.)● Script type

○ 7 in non-alphabetic scripts

○ Ideographic (Chinese) - ~12K characters

○ Hebrew, Arabic - no casing, no vowels, syntactic fusion

○ Vietnamese - tokens are non-compositional syllables

● Attribute-carrying tokens○ Range from 0% (Vietnamese) to 92.4% (Hindi)

● OOV rate (UD against Polyglot vocabulary)○ 16.9%-70.8% type-level (median 29.1%)

19

Page 90: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Language Selection (contd.)● Script type

○ 7 in non-alphabetic scripts

○ Ideographic (Chinese) - ~12K characters

○ Hebrew, Arabic - no casing, no vowels, syntactic fusion

○ Vietnamese - tokens are non-compositional syllables

● Attribute-carrying tokens○ Range from 0% (Vietnamese) to 92.4% (Hindi)

● OOV rate (UD against Polyglot vocabulary)○ 16.9%-70.8% type-level (median 29.1%)

○ 2.2%-33.1% token-level (median 9.2%)

19

Page 91: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Evaluated Systems● NONE: Polyglot’s default UNK embedding

the sittingisflatfish

20

Page 92: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Evaluated Systems● NONE: Polyglot’s default UNK embedding

● MIMICK

the sittingisflatfish

20

Page 93: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Evaluated Systems● NONE: Polyglot’s default UNK embedding

● MIMICK

● CHAR2TAG - additional RNN layer○ 3x Training time

Char-

LSTM

Char-

LSTM

Char-

LSTM

Char-

LSTM

the sittingisflatfish

20

Page 94: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Evaluated Systems● NONE: Polyglot’s default UNK embedding

● MIMICK

● CHAR2TAG - additional RNN layer○ 3x Training time

● BOTH: MIMICK + CHAR2TAG

Char-

LSTM

Char-

LSTM

Char-

LSTM

Char-

LSTM

the sittingisflatfish

20

Page 95: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Evaluated Systems● NONE: Polyglot’s default UNK embedding

● MIMICK

● CHAR2TAG - additional RNN layer○ 3x Training time

● BOTH: MIMICK + CHAR2TAG

POINT

UNION

ROAD

LIGHT

LONG

Char-

LSTM

Char-

LSTM

Char-

LSTM

Char-

LSTM

the sittingisflatfish

20

Page 96: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Results - Full Data

POS tags (accuracy) Morpho. Attributes (micro F1)21

NO

NE

MIM

ICK

CH

AR

2T

AG

BO

TH

NO

NE

MIM

ICK

CH

AR

2T

AG

BO

TH

Page 97: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Results - 5,000 training tokens

POS tags (accuracy) Morpho. Attributes (micro F1)22

NO

NE

MIM

ICK

CH

AR

2T

AG

BO

TH

NO

NE

MIM

ICK

CH

AR

2T

AG

BO

TH

Page 98: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

NO

NE

MIM

ICK

CH

AR

2T

AG

BO

TH

Results - Language Types (5,000 tokens)

Slavic languages POS23

Page 99: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

NO

NE

MIM

ICK

CH

AR

2T

AG

BO

TH

NO

NE

MIM

ICK

CH

AR

2T

AG

BO

TH

Results - Language Types (5,000 tokens)

Slavic languages POS Agglutinative languages morpho. attribute F123

Page 100: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Results - Chinese

24POS tags (accuracy) Morpho. Attributes (micro F1)

NO

NE

MIM

ICK

CH

AR

2T

AG

BO

TH

NO

NE

MIM

ICK

CH

AR

2T

AG

BO

TH

Page 101: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

A Word (Model) from our Sponsor

Code & models:

https://github.com/yuvalpinter/Mimick

25

Page 102: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

A Word (Model) from our Sponsor● Our extrinsic results are on tagging

Code & models:

https://github.com/yuvalpinter/Mimick

25

Page 103: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

A Word (Model) from our Sponsor● Our extrinsic results are on tagging

● Please consider us for all your WE use cases!

Code & models:

https://github.com/yuvalpinter/Mimick

25

Page 104: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

A Word (Model) from our Sponsor● Our extrinsic results are on tagging

● Please consider us for all your WE use cases!○ Sentiment!

Code & models:

https://github.com/yuvalpinter/Mimick

25

Page 105: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

A Word (Model) from our Sponsor● Our extrinsic results are on tagging

● Please consider us for all your WE use cases!○ Sentiment!

○ Parsing!

Code & models:

https://github.com/yuvalpinter/Mimick

25

Page 106: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

A Word (Model) from our Sponsor● Our extrinsic results are on tagging

● Please consider us for all your WE use cases!○ Sentiment!

○ Parsing!

○ IE! Code & models:

https://github.com/yuvalpinter/Mimick

25

Page 107: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

A Word (Model) from our Sponsor● Our extrinsic results are on tagging

● Please consider us for all your WE use cases!○ Sentiment!

○ Parsing!

○ IE!

○ QA!

Code & models:

https://github.com/yuvalpinter/Mimick

25

Page 108: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

A Word (Model) from our Sponsor● Our extrinsic results are on tagging

● Please consider us for all your WE use cases!○ Sentiment!

○ Parsing!

○ IE!

○ QA!

○ ...

Code & models:

https://github.com/yuvalpinter/Mimick

25

Page 109: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

A Word (Model) from our Sponsor● Our extrinsic results are on tagging

● Please consider us for all your WE use cases!○ Sentiment!

○ Parsing!

○ IE!

○ QA!

○ ...

● Code compatible with w2v, Polyglot, FastText

Code & models:

https://github.com/yuvalpinter/Mimick

25

Page 110: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

A Word (Model) from our Sponsor● Our extrinsic results are on tagging

● Please consider us for all your WE use cases!○ Sentiment!

○ Parsing!

○ IE!

○ QA!

○ ...

● Code compatible with w2v, Polyglot, FastText

● Models for Polyglot also on github

Code & models:

https://github.com/yuvalpinter/Mimick

25

Page 111: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

A Word (Model) from our Sponsor● Our extrinsic results are on tagging

● Please consider us for all your WE use cases!○ Sentiment!

○ Parsing!

○ IE!

○ QA!

○ ...

● Code compatible with w2v, Polyglot, FastText

● Models for Polyglot also on github○ <1MB each, dynet format

Code & models:

https://github.com/yuvalpinter/Mimick

25

Page 112: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

A Word (Model) from our Sponsor● Our extrinsic results are on tagging

● Please consider us for all your WE use cases!○ Sentiment!

○ Parsing!

○ IE!

○ QA!

○ ...

● Code compatible with w2v, Polyglot, FastText

● Models for Polyglot also on github○ <1MB each, dynet format

○ Learn all OOVs in advance and add to param table, or

Code & models:

https://github.com/yuvalpinter/Mimick

25

Page 113: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

A Word (Model) from our Sponsor● Our extrinsic results are on tagging

● Please consider us for all your WE use cases!○ Sentiment!

○ Parsing!

○ IE!

○ QA!

○ ...

● Code compatible with w2v, Polyglot, FastText

● Models for Polyglot also on github○ <1MB each, dynet format

○ Learn all OOVs in advance and add to param table, or

○ Load into memory and infer on-line

Code & models:

https://github.com/yuvalpinter/Mimick

25

Page 114: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Conclusions

26

Page 115: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Conclusions● MIMICK: an OOV-extension embedding processing step for downstream tasks

26

Page 116: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Conclusions● MIMICK: an OOV-extension embedding processing step for downstream tasks

● Compositional model complementing distributional artifact

26

Page 117: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Conclusions● MIMICK: an OOV-extension embedding processing step for downstream tasks

● Compositional model complementing distributional artifact

● Powerful technique for low-resource scenarios

26

Page 118: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Conclusions● MIMICK: an OOV-extension embedding processing step for downstream tasks

● Compositional model complementing distributional artifact

● Powerful technique for low-resource scenarios

● Especially good for:

26

Page 119: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Conclusions● MIMICK: an OOV-extension embedding processing step for downstream tasks

● Compositional model complementing distributional artifact

● Powerful technique for low-resource scenarios

● Especially good for:○ Morphologically-rich languages

26

Page 120: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Conclusions● MIMICK: an OOV-extension embedding processing step for downstream tasks

● Compositional model complementing distributional artifact

● Powerful technique for low-resource scenarios

● Especially good for:○ Morphologically-rich languages

○ Large character vocabulary

26

Page 121: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Conclusions● MIMICK: an OOV-extension embedding processing step for downstream tasks

● Compositional model complementing distributional artifact

● Powerful technique for low-resource scenarios

● Especially good for:○ Morphologically-rich languages

○ Large character vocabulary

● Sore spots and Future Work

26

Page 122: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Conclusions● MIMICK: an OOV-extension embedding processing step for downstream tasks

● Compositional model complementing distributional artifact

● Powerful technique for low-resource scenarios

● Especially good for:○ Morphologically-rich languages

○ Large character vocabulary

● Sore spots and Future Work○ Vietnamese - syllabic vocabulary

26

Page 123: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Conclusions● MIMICK: an OOV-extension embedding processing step for downstream tasks

● Compositional model complementing distributional artifact

● Powerful technique for low-resource scenarios

● Especially good for:○ Morphologically-rich languages

○ Large character vocabulary

● Sore spots and Future Work○ Vietnamese - syllabic vocabulary

○ Hebrew and Arabic - nontrivial tokenization, no case

26

Page 124: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Conclusions● MIMICK: an OOV-extension embedding processing step for downstream tasks

● Compositional model complementing distributional artifact

● Powerful technique for low-resource scenarios

● Especially good for:○ Morphologically-rich languages

○ Large character vocabulary

● Sore spots and Future Work○ Vietnamese - syllabic vocabulary

○ Hebrew and Arabic - nontrivial tokenization, no case

○ Try other subword levels (morphemes, phonemes, bytes)

26

Page 125: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Conclusions● MIMICK: an OOV-extension embedding processing step for downstream tasks

● Compositional model complementing distributional artifact

● Powerful technique for low-resource scenarios

● Especially good for:○ Morphologically-rich languages

○ Large character vocabulary

● Sore spots and Future Work○ Vietnamese - syllabic vocabulary

○ Hebrew and Arabic - nontrivial tokenization, no case

○ Try other subword levels (morphemes, phonemes, bytes)

○ Improve morphosyntactic attribute tagging scheme

26

Page 126: Mimicking Word Embeddings using Subword RNNs · Add subword layer to supervised task Ling et al. (2015), Plank et al. (2016) OOVs benefit from co-trained character model Requires

Questions?

Code & models:

https://github.com/yuvalpinter/Mimick

Neglect

Satisfaction

Illness

Espionage

Bullying

27