Emily Pitler, Annie Louis, Ani Nenkova University of Pennsylvania

Automatic Sense Prediction for Implicit Discourse

Relations in TextEmily Pitler, Annie Louis, Ani Nenkova

University of Pennsylvania

Implicit Discourse Relations

I am in Singapore, but I live in the United States.◦ Explicit Comparison

The main conference is over Wednesday. I am staying for EMNLP. ◦ Implicit Comparison

Implicit discourse relations are hard

I am here because I have a presentation to give at ACL.◦ Explicit Contingency

I am a little tired; there is a 13 hour time difference.◦ Implicit Contingency

Implicit discourse relations are hard

Focus on implicit discourse relations ◦ in a realistic distribution

Better understanding of lexical features◦ Showed do not capture semantic oppositions

Empirical validation of new and old features◦ Polarity, verb classes, context, and some lexical

features indicate discourse relations

First experiments on implicits

Classify both implicits and explicits◦ Same sentence [Soricut and Marcu, 2003]

◦ Graphbank corpus: doesn’t distinguish implicit and explicit [Wellner et al., 2006]

Create artificial implicits by deleting connective◦ I am in Singapore, but I live in the United States.◦ [Marcu and Echihabi, 2001; Blair-Goldensohn et al., 2007;

Sporleder and Lascarides, 2008]

Related work on relation sense

Word Pairs Investigation

Most basic feature for implicits

I_there, I_is, …, tired_time, tired_difference

Word pairs as features

aIittle

there is a 13 hour time difference

Marcu and Echihabi , 2001

The recent explosion of country funds mirrors the “closed-end fund mania of the 1920s, Mr. Foot says, when narrowly focused funds grew wildly popular.

They fell into oblivion after the 1929 crash.

Intuition: with large amounts of data, will find semantically-related pairs

Using just content words reduces performance (but has steeper learning curve) ◦ Marcu and Echihabi, 2001

Nouns and adjectives don’t help at all ◦ Lapata and Lascarides, 2004

Filtering out stopwords lowers results ◦ Blair-Goldensohn et al., 2007

Meta error analysis of prior work

Synthetic implicits: Cause/Contrast/None sentences ◦ Explicit instances from Gigaword with connective

deleted ◦ Because Cause, But Contrast◦ At least 3 sentences apart None◦ Blair-Goldensohn et al., 2007

Random selection ◦ 5,000 Cause◦ 5,000 Other

Computed information gain of word pairs

Word pairs experiments

The government says it has reached most isolated townships by now, but because roads are blocked, getting anything but basic food supplies to people remains difficult.

but because Comparison

but because Contingency

“but” signals “Not-Comparison” in synthetic data

Maybe even with lots and lots of data, we won’t see “popular…but…oblivion” that often

What are we trying to get at?

Popular Desirable Mollify

Oblivion Abhorrent Enrage

Sentiment orientation relieves lexical sparsity

Features for sense prediction

Multi-perspective Question Answering Opinion Corpus◦ Wilson et. al, 2005

Sentiment words annotated as◦ Positive◦ Negative◦ Both◦ Neutral

Resource for Polarity Tags

Similar to word pairs, but words replaced with polarity tags

Arg1: Executives at Time Inc. Magazine Co., a subsidiary of Time Warner, have said the joint venture with Mr. Lang wasn’t a good one.

Arg2: The venture, formed in 1986, was supposed to be Time’s low-cost, safe entry into women’s magazines.

Arg1NegatePositive Arg2PositiveArg1NegatePositiveArg2Positive

Polarity Tags pairs

General Inquirer lexicon◦ Stone et al., 1966◦ Semantic categories of words

Complementary classes◦ “Understatement” vs. “Overstatement”◦ “Rise” vs. “Fall”◦ “Pleasure” vs. “Pain”

Features ~ Tag pairs, only verbs

Inquirer Tags

Newsweek's circulation for the first six months of 1989 was 3,288,453, flat from the same period last year

U.S. News' circulation in the same time was 2,303,328, down 2.6%

Probably WSJ-specific

Money/Percent/Num

Levin verb class level in LCS database◦ Levin, 1993; Dorr, 2001◦ More related verbs ~ Expansion

Average length of verb chunk◦ They [are allowed to proceed] ~ Contingency◦ They [proceed] ~ Expansion, Temporal

POS tags of the main verb◦ Same tense ~ Expansion◦ Different tense ~ Contingency, Temporal

Prior work found first and last words very helpful in predicting sense◦ Wellner et al., 2006◦ Often explicit connectives

First-Last, First3

Was preceding/following relation explicit?

◦ If so, which sense?

◦ If so, which connective?

Does Arg1 begin a paragraph?

Context

Largest available annotated corpus of discourse relations◦ Penn Treebank WSJ articles◦ 16,224 implicit relations between adjacent

sentences

I am a little tired; [because] there is a 13 hour time difference.◦ Contingency.cause.reason

Penn Discourse Treebank

Relation SenseProportion of implicits

Expansion 53%

Contingency 26%

Comparison 15%

Temporal 6%

Top level senses in PDTB

Developed features on sections 0-1 Trained on sections 2-20 Tested on sections 21-22 Binary classification task for each sense

Trained on equal numbers of positive and negative examples

Tested on natural distribution

Naïve Bayes classifier

Classification Experiments on PDTB Implicits

Results

Motivation in prior work◦ Train on synthetic implicits

Results: Word pairs for comparison and contingency

17.13 31.10

20.96 43.79

21.96 45.60

What works better◦ Train on actual implicits

Synthetic examples can still

Comp. Cont.

◦ With only best features selected from synthetic implicits

Features f-score

First-Last, First3 21.01

Context 19.32

Money/Percent/Num 19.04

Random 9.91

Results: Comparison

Polarity is actually the

worst feature16.63

Comparison

Not Comparison

Positive-Negative or Negative-Positive Pairs

30% 31%

Distribution of Opposite Polarity Pairs

Features f-score

Verbs 36.59

Context 29.55

Random 19.11

Results: Contingency

Features f-score

Polarity Tags 71.29

Inquirer Tags 70.21

Context 67.77

Random 64.74

Results: Expansion

• Expansion is majority class

• precision more problematic than recall

• These features all help other senses

Features f-score

Verbs 12.61

Context 12.34

Random 5.38

Results: Temporal

Temporals often end with words like “Monday” or “yesterday”

Comparison◦ Selected word pairs

Contingency◦ Polarity, Verb, First/Last, Modality, Context,

Selected word pairs

Best feature sets

Expansion◦ Polarity, Inquirer Tags, Context

Temporal◦ First/Last+word pairs

Best feature sets

Comparison

21.96 (17.13)

Contingency

47.13 (31.10)

Expansion

76.41 (63.84)

Temporal

16.76 (16.21)

Best Results: f-scores

Comparison/Contingency baseline: synthetic implicits word pairsExpansion/Temporal baseline: real implicits word pairs

Results from classifying each relation independently◦ Naïve Bayes, MaxEnt, AdaBoost

Since context features were helpful, tried CRF

6-way classification, word pairs as features◦ Naïve Bayes accuracy: 43.27%◦ CRF accuracy: 44.58%

Further experiments using context

Focus on implicit discourse relations ◦ in a realistic distribution

Better understanding of word pairs◦ Showed do not capture semantic oppositions

Empirical validation of new and old features◦ Polarity, verb classes, context, and some lexical

features indicate discourse relations

Conclusion

Emily Pitler, Annie Louis, Ani Nenkova University of Pennsylvania

Documents

THE PETROLOGY OF ZIYARAN VOL ANI ELT (ZV ) EOENE VOL ANI

Installation Manual - multilps.com ANI Range 0001-9999 G-Star ANI Timing 320 msec Motorola’s MDC-1200 ANI Range 0000-FFFF Motorola’s MDC-1200 ANI Timing ~180 msec

Paronyan Ani

Ani Aprahamian

Ani Matrix

Daniela Tahirova, Milena Nenkova, Natalia Nikolova, Teodora Gerginova

Fistula Ani

SD3026 (ANI)

Text Specificity and Impact on Quality of News Summaries Annie Louis & Ani Nenkova University of Pennsylvania June 24, 2011

Thermally Ani

Annie Louis & Ani Nenkova University of Pennsylvania

JURNAL ani

· 341204 facilitator de dezvoltare comunitara 251901 consultant in informatica 335406 expert 251101 proiectant sisteme informatice 1-5 ani -10 ani 5 -10 ani 5 -10 ani 5 10 ani 5-

Daniela Tahirova, Milena Nenkova, Natalia Nikolova, Teodora Gerginova The Bernhard Achterberg Institute of Psychodrama, Individual and Group Psychotherapy

Contentsnenkova/1500000015-Nenkova.pdf · Automatic Summarization By Ani Nenkova and Kathleen McKeown Contents 1 Introduction 104 ... marization uses a new form of generation, text-to-text

1 Evaluating Summary Content Selection Pyramid Method: Work in Progress Rebecca Passonneau Ani Nenkova

Hindu Up Ani Shads - Up Ani Shads Mandukya Upanishad

Pruritis Ani

Notice of Results - ANI Receivership | | ANI Receivership

37 3 ANI Hotarare ref aprobare tematica examen 3 ani