92
A Distributional and Orthographic Aggregation Model for English Derivational Morphology Daniel Deutsch,* John Hewitt,* and Dan Roth *equal contribution

A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

A Distributional and Orthographic Aggregation Model for English Derivational Morphology

Daniel Deutsch,* John Hewitt,* and Dan Roth

*equal contribution

Page 2: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

2

Co-Authors

John HewittCo-First Author

Dan RothAdvisor

Page 3: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

3

Derivational Morphology

employer

employemployment

intensely

intenseintensity

Page 4: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

4

Derivational Morphology

employer

employemployment

intensely

intenseintensity

root wordtransformation

derived word

Page 5: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

5

Derivational Morphology

employer

employemployment

intensely

intenseintensity

root wordtransformation

derived word

Page 6: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

6

Motivation

• Machine translation

• Text simplification

• Language generation

Page 7: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

7

Challenges

• Suffix ambiguity

• Orthographic irregularity

Page 8: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

8

Suffix Ambiguity

“I have an observament!”

Page 9: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

9

Suffix Ambiguity

ground

grounding

*groundation

*groundment

*groundal

“I have an observament!”

Result

Page 10: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

10

Suffix Ambiguity

validvalidity

*validnessNominal

ground

grounding

*groundation

*groundment

*groundal

“I have an observament!”

Result

Page 11: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

11

Orthographic Irregularity

speak speechResult

Page 12: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

12

Orthographic Irregularity

speak

creak

speechResult

Result

Page 13: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

13

Orthographic Irregularity

speak

creak

speech

*creech

Result

Result

Page 14: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

*creech

14

Orthographic Irregularity

speak

creak

speech

creaking

Result

Result

Page 15: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

*creech

15

Orthographic Irregularity

speak

creak

erupt

speech

creaking

eruption

Result

Result

Result

Page 16: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

*creech

16

Orthographic Irregularity

speak

creak

erupt

bankrupt

speech

creaking

eruption

Result

Result

Result

Result

Page 17: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

*creech

17

Orthographic Irregularity

speak

creak

erupt

bankrupt

speech

creaking

eruption

*bankruption

Result

Result

Result

Result

Page 18: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

18

Orthographic Irregularity

speak speech

creak *creech

erupt eruption

bankrupt *bankruption bankruptcy

creaking

Result

Result

Result

Result

Page 19: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

19

Model Overview

wise

Adverb+ wisely

aggregation

distributional:

orthographic irregularity

orthographic:

suffix ambiguity

Page 20: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

20

Model Overview

wise

Adverb+

Page 21: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

21

Model Overview

distributional:

orthographic irregularity

orthographic:

suffix ambiguity

Page 22: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

22

Model Overview

wisely

aggregation

Page 23: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

23

Model Overview

wise

Adverb+ wisely

aggregation

distributional:

orthographic irregularity

orthographic:

suffix ambiguity

Page 24: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

24

Model Overview

orthographic:

suffix ambiguity

Page 25: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

25

Orthographic Model

• Seq2Seq baseline

• Dictionary-constrained decoding

• Reranking with frequency information

Page 26: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

26

Seq2Seq Baseline

c o m p o s e #

Re

su

lt#

c o m p o s i

Page 27: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

27

Seq2Seq Baseline

c o m p o s e ##

Re

su

lt

Page 28: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

28

Seq2Seq Baseline

c o m p o s e ##

Page 29: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

29

Seq2Seq Baseline

Re

su

lt

Page 30: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

30

Seq2Seq Baselinec o m p o s i

Page 31: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

• Seq2Seq models generate many unattested words, but are reasonable guesses

31

Dictionary-Constrained Decoding

ground

grounding

*groundation

*groundment

*groundalResult

Suffix Ambiguity

Page 32: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

• Seq2Seq models generate many unattested words, but are reasonable guesses

• Intuition: constrain model to only generate known words

32

Dictionary-Constrained Decoding

ground

grounding

*groundation

*groundment

*groundalResult

Suffix Ambiguity

Page 33: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

33

Dictionary-Constrained Decoding

Page 34: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

34

Dictionary-Constrained Decoding

#

a

b

ab

aa

bb

aba

abb

baa

bab

ba

#

Page 35: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

35

Dictionary-Constrained Decoding

#

a

b

ab

aa

bb

aba

abb

baa

bab

ba

#

Page 36: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

36

Dictionary-Constrained Decoding

#

a

b

ab

aa

bb

aba

abb

baa

bab

ba

#

Page 37: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

37

Dictionary-Constrained Decoding

#

a

b

ab

aa

bb

aba

abb

baa

bab

ba

#

Page 38: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

38

Dictionary-Constrained Decoding

#

a

b

ab

aa

bb

aba

abb

baa

bab

ba

#

Page 39: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

39

Dictionary-Constrained Decoding

#

a

b

ab

aa

bb

aba

abb

baa

bab

ba

#

Page 40: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

40

Dictionary-Constrained Decoding

#

a

b

ab

aa

bb

aba

abb

baa

bab

ba

#

Page 41: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

41

Dictionary-Constrained Decoding

#

a

b

ab

aa

bb

aba

abb

baa

bab

ba

#

Page 42: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

42

Dictionary-Constrained Decoding

Search over trie

induced from dictionary

#

a

b

ab

aba

ba

#

Page 43: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

43

Reranking with Frequency Information

refute Result

Page 44: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

44

Reranking with Frequency Information

refution

refutation

refut

refuty

refutat

-1.1

-1.2

-4.8

-5.6

-8.7

refute Result

Model Output Model Score

Page 45: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

45

Reranking with Frequency Information

refution

refutation

refut

refuty

refutat

-1.1

-1.2

-4.8

-5.6

-8.7

refute Result

Model Output Model Score

Page 46: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

46

Reranking with Frequency Information

refution

refutation

refut

refuty

refutat

-1.1

-1.2

-4.8

-5.6

-8.7

5.0

14.3

7.4

0.1

8.6

refute Result

Model Output Model ScoreLog Corpus

Freq

Page 47: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

47

Reranking with Frequency Information

Model Output

refution

refutation

refut

refuty

refutat

Model Score

-1.1

-1.2

-4.8

-5.6

-8.7

Log Corpus

Freq

5.0

14.3

7.4

0.1

8.6

Reranker

Output

refutation

refution

refut

refuty

refutat

Reranker

Score

0.5

-0.9

-0.9

-0.9

-0.9

refute Result

Page 48: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

48

Model Overview

orthographic:

suffix ambiguity

Page 49: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

49

Model Overview

distributional:

orthographic irregularity

Page 50: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

• Orthographic information can be unreliable

• Semantic transformation remains the same

50

Distributional Model

*creech

speak

creak

speech

creaking

Orthographic

Irregularity

Result

Result

Page 51: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

51

Distributional Model

Intuition

Page 52: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

52

Distributional Model

Intuition

Page 53: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

53

Distributional Model

Intuition

Learn non-linear function

per transformation

Page 54: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

54

Distributional Model

Intuition

Learn non-linear function

per transformation

Independent of

orthography

Page 55: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

55

Distributional Model

non-linear

function

Page 56: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

56

Model Overview

distributional:

orthographic irregularity

Page 57: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

57

Model Overview

wisely

aggregation

Page 58: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

58

Aggregation Model

approval

Orthographic

approvation

-0.2

Distributional

approval

-0.1

non-linear

function

Page 59: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

59

Aggregation Model

Ortho Score ScoreDistributional

-0.9

-0.3

-0.5

-0.8

-0.6

-0.8

-1.1

-0.9

approvation

bankruption

expertly

stroller

approval

bankruptcy

expertly

strolls

Page 60: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

60

Aggregation Model

Ortho Score ScoreDistributional

-0.9

-0.3

-0.8

-0.6

-0.8

-0.9

approvation

bankruption

stroller

approval

bankruptcy

strolls

Page 61: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

61

Aggregation Model

Ortho Score Score Aggregation Selection

approvation

bankruption

expertly

stroller

Distributional

approval

bankruptcy

expertly

strolls

-0.9

-0.3

-0.5

-0.8

-0.6

-0.8

-1.1

-0.9

approval

bankruption

expertly

stroller

Page 62: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

Experiments

62

Page 63: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

Dataset

Transformation Count Example

Adverb 1715

Result 1251

Agent 801

Nominal 354

recite recital

overstate overstatement

simulate simulation

wise wisely

survive survivor

yodel yodeler

effective effectiveness

pessimistic pessimism

intense intensity

Cotterell et al. 2017

63

Page 64: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

64

Experiment Details

• 30 random restarts

• Token information: Google Book NGrams

– 360k unigram types

– Token counts aggregated

• Google News pre-trained word embeddings

• Evaluation: full-token match accuracy

Page 65: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

65

Results Legend

Seq2Seq

Distributional

Aggregation

Dictionary-Constrained Decoding

Frequency-Based Reranking

Page 66: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

66

Results

40

45

50

55

60

65

70

75

80

85

Dist Seq Aggr Seq+Freq Aggr+Freq

Toke

n A

ccura

cy

Unconstrained

Constrained

Cotterell et al.

2017

Page 67: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

Cotterell et al.

2017

Constrained

67

Results

40

45

50

55

60

65

70

75

80

85

Dist Seq Aggr Seq+Freq Aggr+Freq

Toke

n A

ccura

cy

Unconstrained

Page 68: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

68

Results

40

45

50

55

60

65

70

75

80

85

Dist Seq Aggr Seq+Freq Aggr+Freq

Toke

n A

ccura

cy

Unconstrained

Constrained

Cotterell et al.

2017

Page 69: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

69

Results

40

45

50

55

60

65

70

75

80

85

Dist Seq Aggr Seq+Freq Aggr+Freq

Toke

n A

ccura

cy

Unconstrained

Constrained

Cotterell et al.

2017

Significant improvement when combining Dist and Seq

Page 70: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

70

Results

40

45

50

55

60

65

70

75

80

85

Dist Seq Aggr Seq+Freq Aggr+Freq

Toke

n A

ccura

cy

Unconstrained

Constrained

Cotterell et al.

2017

Frequency statistics are a valuable signal

Page 71: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

71

Results

40

45

50

55

60

65

70

75

80

85

Dist Seq Aggr Seq+Freq Aggr+Freq

Toke

n A

ccura

cy

Unconstrained

Constrained

Cotterell et al.

2017

Combined model still outperforms separate models

Page 72: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

72

Results

40

45

50

55

60

65

70

75

80

85

Dist Seq Aggr Seq+Freq Aggr+Freq

Toke

n A

ccura

cy

Unconstrained

Constrained

Cotterell et al.

2017

Page 73: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

Cotterell et al.

2017 73

Results

40

45

50

55

60

65

70

75

80

85

Dist Seq Aggr Seq+Freq Aggr+Freq

Toke

n A

ccura

cy

Unconstrained

Constrained

22% and 37% relative error reductions over Seq

Page 74: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

74

Results by Transformation

0

10

20

30

40

50

60

70

80

90

100

Nominal Result Agent Adverb

Toke

n A

ccura

cy

Baseline

Aggr

Aggr+Freq+Dict

Cotterell et al.

2017

Page 75: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

Baseline

Aggr

Aggr+Freq+Dict

75

Results by Transformation

0

10

20

30

40

50

60

70

80

90

100

Nominal Result Agent Adverb

Toke

n A

ccura

cy

Cotterell et al.

2017

Page 76: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

Baseline

Aggr

Aggr+Freq+Dict

76

Results by Transformation

0

10

20

30

40

50

60

70

80

90

100

Nominal Result Agent Adverb

Toke

n A

ccura

cy

Cotterell et al.

2017

Page 77: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

Baseline

Aggr

77

Results by Transformation

0

10

20

30

40

50

60

70

80

90

100

Nominal Result Agent Adverb

Toke

n A

ccura

cy

Aggr+Freq+Dict

Cotterell et al.

2017

Page 78: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

78

Results by Transformation

0

10

20

30

40

50

60

70

80

90

100

Nominal Result Agent Adverb

Toke

n A

ccura

cy

Baseline

Aggr

Aggr+Freq+Dict

Cotterell et al.

2017

Page 79: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

79

Results by Transformation

0

10

20

30

40

50

60

70

80

90

100

Nominal Result Agent Adverb

Toke

n A

ccura

cy

Baseline

Aggr

Aggr+Freq+Dict

Cotterell et al.

2017

Page 80: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

80

Results by Transformation

0

10

20

30

40

50

60

70

80

90

100

Nominal Result Agent Adverb

Toke

n A

ccura

cy

Baseline

Aggr

Aggr+Freq+Dict

Cotterell et al.

2017

equivalent {equivalence, equivalency}Nominal

Page 81: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

81

What does each model do well?

Page 82: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

82

What does each model do well?

Orthographic

does best

Distributional

does best

Page 83: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

83

What does each model do well?

Orthographic

does best

Distributional

does best

Page 84: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

84

What does each model do well?

Orthographic

does best

Distributional

does best

Page 85: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

85

What does each model do well?

*sponsorment

sponsorship

Orthographic

does best

Distributional

does best

Page 86: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

86

What does each model do well?

Orthographic

does best

Distributional

does best

Page 87: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

87

What does each model do well?

Orthographic

does best

Distributional

does best

wise wiselyAdverb

quick quicklyAdverb

Page 88: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

88

What does each model do well?

Orthographic

does best

Distributional

does best

wise wiselyAdverb

quick quicklyAdverb

Result

renew renewal

invest investmentResult

inspire inspirationResult

Page 89: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

89

Conclusion

• Aggregation model for English derivational morphology

• Dictionary-constrained decoding

• Frequency-based reranking

• Distributional model per-transformation

• Best open- and closed-vocabulary models demonstrate 22% and 37% reduction in error– New state-of-the-art results

Page 90: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

90

Code & Data

Code

https://github.com/danieldeutsch/derivational-morphology

Data

https://github.com/ryancotterell/derivational-paradigms

Powered by

Page 91: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

91

References

• Cotterell et al. 2017, Paradigm completion for derivational morphology. In EMNLP

Page 92: A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency information. 26 Seq2Seq Baseline c o m p o s e # esult # c o m p o s i. 27 Seq2Seq

Thank you!