18
Semantic Pleonasm Detection Click to edit the title text format Semantic Pleonasm Detection Omid Kashefi, Andrew Lucas, Rebecca Hwa Intelligent System Program University of Pittsburgh December, 2018

Semantic Pleonasm Detection talk_0.pdfsystems to detect semantic pleonasm • Claim 1: the actual performance of word redundancy metrics is hampered by the mismatch between intended

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Semantic Pleonasm Detection talk_0.pdfsystems to detect semantic pleonasm • Claim 1: the actual performance of word redundancy metrics is hampered by the mismatch between intended

Semantic Pleonasm Detection

Click to edit the title text format

Semantic Pleonasm Detection

Omid Kashefi, Andrew Lucas, Rebecca Hwa

Intelligent System ProgramUniversity of PittsburghDecember, 2018

Page 2: Semantic Pleonasm Detection talk_0.pdfsystems to detect semantic pleonasm • Claim 1: the actual performance of word redundancy metrics is hampered by the mismatch between intended

• Pleonasm• The use of extraneous words in an expression such that

removing them would not significantly alter the meaning of the expression.

• Pleonasm could have different aspects and formed in different layers of language

• Morphemic (e.g., “irregardless”)

• Syntactic (e.g., “the most kindest”)

• Semantic (e.g. “I received a free gift”)

What is Pleonasm?

in the scope of GEC research, especially when they cause errors

Page 3: Semantic Pleonasm Detection talk_0.pdfsystems to detect semantic pleonasm • Claim 1: the actual performance of word redundancy metrics is hampered by the mismatch between intended

What is Semantic Pleonasm?

• Semantic Pleonasm: when the meaning of a word (or phrase) is already implied by other words in the sentence

• “A question of style or taste, not grammar” (Evans et al., 1957)

• Might have some literary functions

• Most modern style guides caution against them in favor of concise writing

Page 4: Semantic Pleonasm Detection talk_0.pdfsystems to detect semantic pleonasm • Claim 1: the actual performance of word redundancy metrics is hampered by the mismatch between intended

Challenges of Detecting Semantic Pleonasm

• Semantic pleonasm is a complex linguistic phenomenon

• There is no appropriate resources to support the development of such systems

• Lack of good strategies to build such resources

• Some GEC corpora (e.g., NUCLE) have “redundant” annotation:

• manifestation of grammar errors

• e.g. “we still have room to improve for our current welfare system”

• rather than a stylistic redundancy

• e.g. “we aim to better improve our welfare system”

• Using NUCLE and GEC corpora does not allow us to separate the question of redundancy from grammaticality.

Page 5: Semantic Pleonasm Detection talk_0.pdfsystems to detect semantic pleonasm • Claim 1: the actual performance of word redundancy metrics is hampered by the mismatch between intended

Semantic Pleonasm Corpus (SPC)

• Raw Data

• Round Seven of the Yelp Dataset Challenge

• The writing is more casual

• The writing is often more emotional

• The writing is more likely to contain semantic pleonasms

Page 6: Semantic Pleonasm Detection talk_0.pdfsystems to detect semantic pleonasm • Claim 1: the actual performance of word redundancy metrics is hampered by the mismatch between intended

• Annotation Principles

1. Not directly annotate over raw text because most sentences do not containing pleonasms

2. Negative examples should be challenging

3. Avoid sentences with obvious grammar errors

• Filter for sentences containing a pair of adjacent semantically similar words (via WordNet)

• Filter out ungrammatical sentences

Semantic Pleonasm Corpus (SPC)

Page 7: Semantic Pleonasm Detection talk_0.pdfsystems to detect semantic pleonasm • Claim 1: the actual performance of word redundancy metrics is hampered by the mismatch between intended

• Annotation Procedures

• Our annotators are from Amazon Mechanical Turk

• Turkers are given six sentences with a pair of semantically similar adjacent words at a time to decide whether to delete the first word, the second word, both, or neither.

• Each sentence is reviewed by three different Turkers

• Final annotation is based on majority consensus

Semantic Pleonasm Corpus (SPC)

Page 8: Semantic Pleonasm Detection talk_0.pdfsystems to detect semantic pleonasm • Claim 1: the actual performance of word redundancy metrics is hampered by the mismatch between intended

Semantic Pleonasm Corpus (SPC)

• Examples

• Freshly squeezed and no additives, just plain pure fruit pulp”

• Consensus: plain is redundant

• “It is clear that I will never have another prime firstexperience like the one I had at Chompies.”

• Consensus: neither word is redundant

• “The dressing is absolutely incredibly fabulouslyflavorful!”

• Consensus: both words are redundant

Page 9: Semantic Pleonasm Detection talk_0.pdfsystems to detect semantic pleonasm • Claim 1: the actual performance of word redundancy metrics is hampered by the mismatch between intended

• Statistics

Semantic Pleonasm Corpus (SPC)

OneBoth Neither Total

First Second

955 765 16 1,283 3,019

32% 25%1% 42% 100%

57%

Page 10: Semantic Pleonasm Detection talk_0.pdfsystems to detect semantic pleonasm • Claim 1: the actual performance of word redundancy metrics is hampered by the mismatch between intended

• Inter-Annotator Agreement

• Word Level — whether the first, second, both, or neither of the candidates is pleonastic

• Sentence Level — whether a sentence has a pleonastic construction

Semantic Pleonasm Corpus (SPC)

Consensus Level Fleiss’s Kappa

Word Level 0.384

Sentence Level 0.482

Page 11: Semantic Pleonasm Detection talk_0.pdfsystems to detect semantic pleonasm • Claim 1: the actual performance of word redundancy metrics is hampered by the mismatch between intended

• SPC can serve as a valuable resource for developing systems to detect semantic pleonasm

• Claim 1: the actual performance of word redundancy metrics is hampered by the mismatch between intended domain (i.e., semantic pleonasm) and the available corpus they are evaluated on (i.e., GEC corpora and NUCLE)

• SPC is focused on the desired target domain

• Claim 2: without appropriate negative examples, it is not clear how to apply word redundancy metrics to sentences with no redundancy

• SPC contains negative example so it is suitable to train sentence classifiers

Automatic Pleonasm Detection

Page 12: Semantic Pleonasm Detection talk_0.pdfsystems to detect semantic pleonasm • Claim 1: the actual performance of word redundancy metrics is hampered by the mismatch between intended

Automatic Pleonasm Detection

• Detecting Most Redundant Word

• Validating claim 1: compare the performances of word redundancy metrics on SPC with their performances on NUCLE given the sentences known to contain one

• 1,140 NUCLE and 1,720 SPC sentences

• Baselines• Xue&Hwa a combination of fluency and word meaning contribution model

• SIM semantic similarity between a full sentence and when the target word removed

• GEN the degree to which a word is general by its number of synonyms

• SMP the simplicity of a word based on Flesch-Kincaid readability score

• GEC a GEC system using languagetools; we expect it to be better on NUCLE than SPC

Page 13: Semantic Pleonasm Detection talk_0.pdfsystems to detect semantic pleonasm • Claim 1: the actual performance of word redundancy metrics is hampered by the mismatch between intended

• Detecting Most Redundant Word• Validating claim 1: SPC, while small, is a better fit for the task than NUCLE

Automatic Pleonasm Detection

Method NUCLE SPC

Xue&Hwa 22.8% 31.7%

SIM 11.1% 16.6%

GEN 9.6% 13.3%

SMP 16.1% 20.6%

SIM+SMP+GEN 18.2% 27.6%

ALL 31.1% 39.4%

GEC 11.9% 4.7%

Page 14: Semantic Pleonasm Detection talk_0.pdfsystems to detect semantic pleonasm • Claim 1: the actual performance of word redundancy metrics is hampered by the mismatch between intended

• Detecting Sentences with Pleonasm

• Validating claim 2: use the whole SPC to train binary classifiers

• Baselines

• UG one-hot representation of the sentence

• TG one-hot representation of the trigrams of the sentence

• TFIDF one-hot representation of the smoothed TFIDF tuples of the sentence

• WSTAT [max(𝐴𝐿𝐿), 𝑎𝑣𝑔(𝐴𝐿𝐿),min(𝐴𝐿𝐿), 𝑙𝑒𝑛(𝑠), 𝐿𝑀(𝑠)]

Automatic Pleonasm Detection

Page 15: Semantic Pleonasm Detection talk_0.pdfsystems to detect semantic pleonasm • Claim 1: the actual performance of word redundancy metrics is hampered by the mismatch between intended

Automatic Pleonasm Detection

• Detecting Sentences with Pleonasm

• Encoding words are more relevant than the statistics over the word redundancy

metrics

BaselineSPC

MaxEnt NB

UG 79.2% 88.4%

TG 79.9% 88.8%

TFIDF 83.0% 90.5%

WSTAT 63.1% 53.2%

WSTAT+UG 82.3% 89.2%

WSTAT+TG 83.7% 89.3%

WSTAT+TFIDF 84.5% 92.2%

Page 16: Semantic Pleonasm Detection talk_0.pdfsystems to detect semantic pleonasm • Claim 1: the actual performance of word redundancy metrics is hampered by the mismatch between intended

Conclusion

• We have introduced SPC in which

• Each sentence contains a word pair that is potentially semantically related

• These sentences have been reviewed by human annotators, who determine whether any of the words are redundant

• SPC offers two main contributions

• By focusing on semantic similarity, it provides a more appropriate resource for systems that aim to detect stylistic redundancy rather than grammatical errors

• By balancing between positive and near-miss negative examples, it allows systems to evaluate their ability to detect ”no redundancy.”

Page 17: Semantic Pleonasm Detection talk_0.pdfsystems to detect semantic pleonasm • Claim 1: the actual performance of word redundancy metrics is hampered by the mismatch between intended

• This material is based upon work supported by the National Science Foundation under Grant Number #1735752

• This work is published in NAACL 2018

Acknowledgment

Page 18: Semantic Pleonasm Detection talk_0.pdfsystems to detect semantic pleonasm • Claim 1: the actual performance of word redundancy metrics is hampered by the mismatch between intended

Thank You