Parsing Discourse Relations - Penn Engineeringpdtb2012/assets/...Rob Cohen (Dragonheart) does a great job with this film. 5/1/12 6. Discourse Relation Parsing Giuseppe Riccardi •

Parsing Discourse Relations

Giuseppe Riccardi Signals and Interactive Systems Lab

University of Trento, Italy

Behavioral Analytics

Parser Run on Genia Corpus Among 25 cases, 2 homozygous deletions and 1 hemizygous deletion were found in HCC samples. No point mutation was identified in the remaining 22 tumor samples without p16 gene deletions. Hypermethylation was detected in 24% (6/25) of tumor samples. However, the corresponding non-tumor liver tissue specimens were always unmethylated at the p16 locus. Loss of p16 protein expression occurred in 16 of 35 (45.7%) tumor samples, and all the non-tumor liver tissue specimens showed positive p16 staining. For the 25 cases examined for p16 gene alterations, the loss of p16 protein expression was observed in all tumors with p16 gene alterations and also in 3 tumors without p16 gene alterations. (Source: Genia corpus)

Parser Run on Genia Corpus Among 25 cases, 2 homozygous deletions and 1 hemizygous deletion were found in HCC samples. No point mutation was identified in the remaining 22 tumor samples without p16 gene deletions. Hypermethylation was detected in 24% (6/25) of tumor samples. However, the corresponding non-tumor liver tissue specimens were always unmethylated at the p16 locus. Loss of p16 protein expression occurred in 16 of 35 (45.7%) tumor samples, and all the non-tumor liver tissue specimens showed positive p16 staining. For the 25 cases examined for p16 gene alterations, the loss of p16 protein expression was observed in all tumors with p16 gene alterations and also in 3 tumors without p16 gene alterations. (Source: Genia corpus)

Parser Output : § Hypermethylation was detected in 24 % 6\/25 ) of tumor samples However(Comparison) the corresponding non-tumor liver tissue specimens were always unmethylated at the p16 locus §  Loss of p16 protein expression occurred in 16 of 35 45.7 % ) tumor samples and(Expansion ) all the non-tumor liver tissue specimens showed positive p16 staining

Social Media User Opinions: Negative

The acting is below average, even from the likes of Curtis. You're more likely to get a kick out of her work in Halloween H20. Sutherland is wasted and Baldwin, well, he's acting like a Baldwin, of course. The real star here are Stan Winston's robot design, some schnazzy CGI, and the occasional good gore shot, like picking into someone's brain. So, if robots and body parts really turn you on, here's your movie. Otherwise, it's pretty much a sunken ship of a movie.

5/1/12 5

Social Media User Opinions: Positive

From here on, the plot takes a back seat, and we are treated to some of the best camera work and action staged. Most all the action is plausible and will hold you at the edge of your seat. There are a few melodramatic parts here, but, they tend to work out well. There is no general antagonist in this film, but the action and suspense makes you forget all about that. Daylight is a great film, I saw a non-matinee showing of it, and I thought it was worth every penny. The characterizations are mostly flat, one dimesional, but they have enough in them to get you to care for some of the characters. Rob Cohen (Dragonheart) does a great job with this film.

5/1/12 6

Discourse Relation Parsing

Giuseppe Riccardi

•  Joint work with –  Sucheta Ghosh , U. Trento –  Richard Johansson, U. Trento/U. Gothenburg –  Sara Tonelli , FBK-Irst

Ghosh S., Tonelli S., Riccardi G. and Johansson R., “End-to-End Discourse Parser Evaluation”, IEEE International Conference on Semantic Computing, Menlo Park, USA, 2011 Ghosh S., Johansson R., Riccardi G. and Tonelli S., “Shallow Discourse Parsing with Conditional Random Fields”, International Joint Conference on Natural Language Processing, Chiang Mai, Thailand, 2011 Ghosh S., Johansson R., Riccardi G. and Tonelli S., “Improving Recall Through Global Constraint Selection”, To appear on LREC, 2012

Discourse Parser

– From raw text extract: – Discourse relations:

• Discourse Predicate (Connective) • Connective sense • Arg1 • Arg2

–  Explicit Connective

Giuseppe Riccardi

Parsing Architecture

Parser end2end Architecture

Chunklink

AddDiscourse

RootExtract +Morpha

•  By Sabaine Buchholz

•  CoNLL’00 task

• Pitler & Nenkova ‘09

• Conn. SenseDet.

• Morph & All Feat

•  Johansson+ Minnen et al

Windowing(-2,+2) Arg2 Arg1

Doc

Parser •  Stanford (K&M)

Parse_Tree

Features: Example

Selected Features: Arg1 Features used for Arg1 and Arg2 segmentation and labeling. F1. Token (T) F2. Sense of Connective (CONN) F3. IOB chain (IOB) F4. PoS tag F5. Lemma (L) F6. Inflection (INFL) F7. Main verb of main clause (MV) F8. Boolean feature for MV (BMV) Additional feature used only for Arg1 F9. Previous Sentence (PREV) F10. Arg2 Labels

Inter vs Intra Sentence Arguments

13

This &ilm should be brilliant. Howeverr, it can’t hold up.

Illustration: PREV Feature


14

This &ilm should be brilliant. Howeverr, it can’t hold up. However However However However However 0 0 0 0 0



15

0.77

0.610.68

0.52

0.270.36

00.10.20.30.40.50.60.70.80.9

P R F1

Intra+Prev Inter+Prev

-‐PREV +PREV

This &ilm should be brilliant. Howeverr, it can’t hold up. However However However However However 0 0 0 0 0


Selected Features: Arg2 Features used for Arg1 and Arg2 segmentation and labeling. F1. Token (T) F2. Sense of Connective (CONN) F3. IOB chain (IOB)

Ghosh S., Johansson R., Riccardi G. and Tonelli S., “Shallow Discourse Parsing with Conditional Random Fields”, International Joint Conference on Natural Language Processing, Chiang Mai, Thailand, 2011

Giuseppe Riccardi 17

Parser Evaluation


Parser Evaluation

Giuseppe Riccardi, UNITN

19

Lightweight Features -Reduce dimensionality of IOB chain features -Control robustness of parser (wrt to syntactic parse) -Binary features selected from IOB chain

Giuseppe Riccardi, UNITN

20

Lightweight Features IOB Chain feature replaced by two pairs of Boolean features

(1)   The second top parent node whether starting (B) or not (2)   The third top parent node whether starting (B) or not (3)   The second top parent node whether ending (E) or not (4)   The third top parent node whether ending (E) or not

Example: Tree diagram showed IOB feature for token “flashed” is I-S/E-VP/E-SBAR/E-S/C-VP Replacing Boolean feature for “flashed” respectively: (1)   0 ( ß E-VP ) (2)   0 ( ß E-SBAR ) (3)   1 ( ß E-VP ) (4)   1 ( ß E-SBAR )


Parser Evaluation: Arg2 Exact Match

P R F1

Baseline 0.53 0.46 0.49

Gold - Standard 0.84 0.74 0.79

Gold-Lightweight 0.80 0.74 0.77

AutoConn+GoldSPT 0.82 0.70 0.76

GoldConn+AutoSPT 0.76 0.61 0.68

Lightweight(Auto) 0.72 0.56 0.63

N-Best Parse Re-ranking

22 End2End Disc Parse

Ø Online Passive-Aggressive Perceptron

Ø Structured Voted Perceptron

Ø Linear Preference Learning Support Vector Machine

Ø Linear Best vs. Rest Support Vector Machine


Ø GF0. Log Posteriors Ø GF1. Overgeneration. Ø GF2. Undergeneration. Ø GF3. Intersentential Arg2. Ø GF4. Arg1 after the connective sentence Ø GF5. Argument overlapping with the connective. Ø GF6. Argument begins with I-‐ tag Ø GF7. Argument begins with E-‐ tag

N-Best ReRanking with Global Constraints


Exact Match Scores. Used n-‐best list numbers in parenthesis

Exact Arg1 Arg2 P R F1 P R F1

Baseline 69.88 48.51 57.26 83.44 75.14 79.07 Online PA 66.10 53.92 59.39(16) 82.59 76.39 79.37(4) Struct Per 67.18 52.64 59.03(4) 82.96 76.28 79.48(8) Bestvs Rest 66.19 52.83 58.94(8) 81.69 77.14 79.35(4) Pref-Linear 66.54 53.31 59.20(4) 82.82 76.28 79.42(4)

N-Best ReRanking with Global Constraints

Research Challenges §  Speech , Dialog and Discourse

§  Speech Signal vs Linguistic correlates “Eat your porridge! You’re not going to football practice”

§  Parser

§  Trade-off btw coverage and agreement §  Robustness of features §  Semantic Annotation §  Domain/Genre Adaptation

Research Challenges §  Speech , Dialog and Discourse

§  Acoustics vs lexical correlates “Eat your porridge! You’re not going to football practice”

§  Parser

§  Trade-off amongst §  sense-depth, coverage, agreement

§  Robustness of features §  Semantic Annotation §  Domain/Genre Adaptation

Publications Speech (LUNA Corpus) •  Tonelli S., Riccardi G., Prasad R. and Joshi A. "Annotation of

Discourse Relations for Conversational Spoken Dialogs", LREC Valletta, 2010.

Text (PDTB corpus) •  Ghosh S., Johansson R., Riccardi G. and Tonelli S., “Shallow

Discourse Parsing with Conditional Random Fields”, International Joint Conference on Natural Language Processing, Chiang Mai, Thailand, 2011

•  Ghosh S., Tonelli S., Riccardi G. and Johansson R., “End-to-End Discourse Parser Evaluation”, IEEE International Conference on Semantic Computing, Menlo Park, USA, 2011


Documents

Parsing Discourse Relations - Penn Engineeringpdtb2012/assets/...Rob Cohen (Dragonheart) does a great job with this film. 5/1/12 6. Discourse Relation Parsing Giuseppe Riccardi •