A Distributional and Orthographic Aggregation Model for ...€¦ · •Reranking with frequency...

Preview:

Citation preview

A Distributional and Orthographic Aggregation Model for English Derivational Morphology

Daniel Deutsch,* John Hewitt,* and Dan Roth

*equal contribution

2

Co-Authors

John HewittCo-First Author

Dan RothAdvisor

3

Derivational Morphology

employer

employemployment

intensely

intenseintensity

4

Derivational Morphology

employer

employemployment

intensely

intenseintensity

root wordtransformation

derived word

5

Derivational Morphology

employer

employemployment

intensely

intenseintensity

root wordtransformation

derived word

6

Motivation

• Machine translation

• Text simplification

• Language generation

7

Challenges

• Suffix ambiguity

• Orthographic irregularity

8

Suffix Ambiguity

“I have an observament!”

9

Suffix Ambiguity

ground

grounding

*groundation

*groundment

*groundal

“I have an observament!”

Result

10

Suffix Ambiguity

validvalidity

*validnessNominal

ground

grounding

*groundation

*groundment

*groundal

“I have an observament!”

Result

11

Orthographic Irregularity

speak speechResult

12

Orthographic Irregularity

speak

creak

speechResult

Result

13

Orthographic Irregularity

speak

creak

speech

*creech

Result

Result

*creech

14

Orthographic Irregularity

speak

creak

speech

creaking

Result

Result

*creech

15

Orthographic Irregularity

speak

creak

erupt

speech

creaking

eruption

Result

Result

Result

*creech

16

Orthographic Irregularity

speak

creak

erupt

bankrupt

speech

creaking

eruption

Result

Result

Result

Result

*creech

17

Orthographic Irregularity

speak

creak

erupt

bankrupt

speech

creaking

eruption

*bankruption

Result

Result

Result

Result

18

Orthographic Irregularity

speak speech

creak *creech

erupt eruption

bankrupt *bankruption bankruptcy

creaking

Result

Result

Result

Result

19

Model Overview

wise

Adverb+ wisely

aggregation

distributional:

orthographic irregularity

orthographic:

suffix ambiguity

20

Model Overview

wise

Adverb+

21

Model Overview

distributional:

orthographic irregularity

orthographic:

suffix ambiguity

22

Model Overview

wisely

aggregation

23

Model Overview

wise

Adverb+ wisely

aggregation

distributional:

orthographic irregularity

orthographic:

suffix ambiguity

24

Model Overview

orthographic:

suffix ambiguity

25

Orthographic Model

• Seq2Seq baseline

• Dictionary-constrained decoding

• Reranking with frequency information

26

Seq2Seq Baseline

c o m p o s e #

Re

su

lt#

c o m p o s i

27

Seq2Seq Baseline

c o m p o s e ##

Re

su

lt

28

Seq2Seq Baseline

c o m p o s e ##

29

Seq2Seq Baseline

Re

su

lt

30

Seq2Seq Baselinec o m p o s i

• Seq2Seq models generate many unattested words, but are reasonable guesses

31

Dictionary-Constrained Decoding

ground

grounding

*groundation

*groundment

*groundalResult

Suffix Ambiguity

• Seq2Seq models generate many unattested words, but are reasonable guesses

• Intuition: constrain model to only generate known words

32

Dictionary-Constrained Decoding

ground

grounding

*groundation

*groundment

*groundalResult

Suffix Ambiguity

33

Dictionary-Constrained Decoding

34

Dictionary-Constrained Decoding

#

a

b

ab

aa

bb

aba

abb

baa

bab

ba

#

35

Dictionary-Constrained Decoding

#

a

b

ab

aa

bb

aba

abb

baa

bab

ba

#

36

Dictionary-Constrained Decoding

#

a

b

ab

aa

bb

aba

abb

baa

bab

ba

#

37

Dictionary-Constrained Decoding

#

a

b

ab

aa

bb

aba

abb

baa

bab

ba

#

38

Dictionary-Constrained Decoding

#

a

b

ab

aa

bb

aba

abb

baa

bab

ba

#

39

Dictionary-Constrained Decoding

#

a

b

ab

aa

bb

aba

abb

baa

bab

ba

#

40

Dictionary-Constrained Decoding

#

a

b

ab

aa

bb

aba

abb

baa

bab

ba

#

41

Dictionary-Constrained Decoding

#

a

b

ab

aa

bb

aba

abb

baa

bab

ba

#

42

Dictionary-Constrained Decoding

Search over trie

induced from dictionary

#

a

b

ab

aba

ba

#

43

Reranking with Frequency Information

refute Result

44

Reranking with Frequency Information

refution

refutation

refut

refuty

refutat

-1.1

-1.2

-4.8

-5.6

-8.7

refute Result

Model Output Model Score

45

Reranking with Frequency Information

refution

refutation

refut

refuty

refutat

-1.1

-1.2

-4.8

-5.6

-8.7

refute Result

Model Output Model Score

46

Reranking with Frequency Information

refution

refutation

refut

refuty

refutat

-1.1

-1.2

-4.8

-5.6

-8.7

5.0

14.3

7.4

0.1

8.6

refute Result

Model Output Model ScoreLog Corpus

Freq

47

Reranking with Frequency Information

Model Output

refution

refutation

refut

refuty

refutat

Model Score

-1.1

-1.2

-4.8

-5.6

-8.7

Log Corpus

Freq

5.0

14.3

7.4

0.1

8.6

Reranker

Output

refutation

refution

refut

refuty

refutat

Reranker

Score

0.5

-0.9

-0.9

-0.9

-0.9

refute Result

48

Model Overview

orthographic:

suffix ambiguity

49

Model Overview

distributional:

orthographic irregularity

• Orthographic information can be unreliable

• Semantic transformation remains the same

50

Distributional Model

*creech

speak

creak

speech

creaking

Orthographic

Irregularity

Result

Result

51

Distributional Model

Intuition

52

Distributional Model

Intuition

53

Distributional Model

Intuition

Learn non-linear function

per transformation

54

Distributional Model

Intuition

Learn non-linear function

per transformation

Independent of

orthography

55

Distributional Model

non-linear

function

56

Model Overview

distributional:

orthographic irregularity

57

Model Overview

wisely

aggregation

58

Aggregation Model

approval

Orthographic

approvation

-0.2

Distributional

approval

-0.1

non-linear

function

59

Aggregation Model

Ortho Score ScoreDistributional

-0.9

-0.3

-0.5

-0.8

-0.6

-0.8

-1.1

-0.9

approvation

bankruption

expertly

stroller

approval

bankruptcy

expertly

strolls

60

Aggregation Model

Ortho Score ScoreDistributional

-0.9

-0.3

-0.8

-0.6

-0.8

-0.9

approvation

bankruption

stroller

approval

bankruptcy

strolls

61

Aggregation Model

Ortho Score Score Aggregation Selection

approvation

bankruption

expertly

stroller

Distributional

approval

bankruptcy

expertly

strolls

-0.9

-0.3

-0.5

-0.8

-0.6

-0.8

-1.1

-0.9

approval

bankruption

expertly

stroller

Experiments

62

Dataset

Transformation Count Example

Adverb 1715

Result 1251

Agent 801

Nominal 354

recite recital

overstate overstatement

simulate simulation

wise wisely

survive survivor

yodel yodeler

effective effectiveness

pessimistic pessimism

intense intensity

Cotterell et al. 2017

63

64

Experiment Details

• 30 random restarts

• Token information: Google Book NGrams

– 360k unigram types

– Token counts aggregated

• Google News pre-trained word embeddings

• Evaluation: full-token match accuracy

65

Results Legend

Seq2Seq

Distributional

Aggregation

Dictionary-Constrained Decoding

Frequency-Based Reranking

66

Results

40

45

50

55

60

65

70

75

80

85

Dist Seq Aggr Seq+Freq Aggr+Freq

Toke

n A

ccura

cy

Unconstrained

Constrained

Cotterell et al.

2017

Cotterell et al.

2017

Constrained

67

Results

40

45

50

55

60

65

70

75

80

85

Dist Seq Aggr Seq+Freq Aggr+Freq

Toke

n A

ccura

cy

Unconstrained

68

Results

40

45

50

55

60

65

70

75

80

85

Dist Seq Aggr Seq+Freq Aggr+Freq

Toke

n A

ccura

cy

Unconstrained

Constrained

Cotterell et al.

2017

69

Results

40

45

50

55

60

65

70

75

80

85

Dist Seq Aggr Seq+Freq Aggr+Freq

Toke

n A

ccura

cy

Unconstrained

Constrained

Cotterell et al.

2017

Significant improvement when combining Dist and Seq

70

Results

40

45

50

55

60

65

70

75

80

85

Dist Seq Aggr Seq+Freq Aggr+Freq

Toke

n A

ccura

cy

Unconstrained

Constrained

Cotterell et al.

2017

Frequency statistics are a valuable signal

71

Results

40

45

50

55

60

65

70

75

80

85

Dist Seq Aggr Seq+Freq Aggr+Freq

Toke

n A

ccura

cy

Unconstrained

Constrained

Cotterell et al.

2017

Combined model still outperforms separate models

72

Results

40

45

50

55

60

65

70

75

80

85

Dist Seq Aggr Seq+Freq Aggr+Freq

Toke

n A

ccura

cy

Unconstrained

Constrained

Cotterell et al.

2017

Cotterell et al.

2017 73

Results

40

45

50

55

60

65

70

75

80

85

Dist Seq Aggr Seq+Freq Aggr+Freq

Toke

n A

ccura

cy

Unconstrained

Constrained

22% and 37% relative error reductions over Seq

74

Results by Transformation

0

10

20

30

40

50

60

70

80

90

100

Nominal Result Agent Adverb

Toke

n A

ccura

cy

Baseline

Aggr

Aggr+Freq+Dict

Cotterell et al.

2017

Baseline

Aggr

Aggr+Freq+Dict

75

Results by Transformation

0

10

20

30

40

50

60

70

80

90

100

Nominal Result Agent Adverb

Toke

n A

ccura

cy

Cotterell et al.

2017

Baseline

Aggr

Aggr+Freq+Dict

76

Results by Transformation

0

10

20

30

40

50

60

70

80

90

100

Nominal Result Agent Adverb

Toke

n A

ccura

cy

Cotterell et al.

2017

Baseline

Aggr

77

Results by Transformation

0

10

20

30

40

50

60

70

80

90

100

Nominal Result Agent Adverb

Toke

n A

ccura

cy

Aggr+Freq+Dict

Cotterell et al.

2017

78

Results by Transformation

0

10

20

30

40

50

60

70

80

90

100

Nominal Result Agent Adverb

Toke

n A

ccura

cy

Baseline

Aggr

Aggr+Freq+Dict

Cotterell et al.

2017

79

Results by Transformation

0

10

20

30

40

50

60

70

80

90

100

Nominal Result Agent Adverb

Toke

n A

ccura

cy

Baseline

Aggr

Aggr+Freq+Dict

Cotterell et al.

2017

80

Results by Transformation

0

10

20

30

40

50

60

70

80

90

100

Nominal Result Agent Adverb

Toke

n A

ccura

cy

Baseline

Aggr

Aggr+Freq+Dict

Cotterell et al.

2017

equivalent {equivalence, equivalency}Nominal

81

What does each model do well?

82

What does each model do well?

Orthographic

does best

Distributional

does best

83

What does each model do well?

Orthographic

does best

Distributional

does best

84

What does each model do well?

Orthographic

does best

Distributional

does best

85

What does each model do well?

*sponsorment

sponsorship

Orthographic

does best

Distributional

does best

86

What does each model do well?

Orthographic

does best

Distributional

does best

87

What does each model do well?

Orthographic

does best

Distributional

does best

wise wiselyAdverb

quick quicklyAdverb

88

What does each model do well?

Orthographic

does best

Distributional

does best

wise wiselyAdverb

quick quicklyAdverb

Result

renew renewal

invest investmentResult

inspire inspirationResult

89

Conclusion

• Aggregation model for English derivational morphology

• Dictionary-constrained decoding

• Frequency-based reranking

• Distributional model per-transformation

• Best open- and closed-vocabulary models demonstrate 22% and 37% reduction in error– New state-of-the-art results

90

Code & Data

Code

https://github.com/danieldeutsch/derivational-morphology

Data

https://github.com/ryancotterell/derivational-paradigms

Powered by

91

References

• Cotterell et al. 2017, Paradigm completion for derivational morphology. In EMNLP

Thank you!

Recommended