53
July 2003 LSA 1 Computational Approaches to Computational Approaches to Reference Reference Massimo Poesio (University of Essex) Lecture 4: Centering Theory

Computational Approaches to Reference

  • Upload
    varuna

  • View
    48

  • Download
    0

Embed Size (px)

DESCRIPTION

Computational Approaches to Reference. Massimo Poesio (University of Essex) Lecture 4: Centering Theory. Today’s lecture. A formalization of the notion of ‘in focus’: Centering Evidence for centering: Behavioral Corpora Centering-based anaphora resolution. Theories of salience & focusing. - PowerPoint PPT Presentation

Citation preview

Page 1: Computational Approaches to Reference

July 2003 LSA 1

Computational Approaches to Computational Approaches to ReferenceReference

Massimo Poesio (University of Essex)

Lecture 4:Centering Theory

Page 2: Computational Approaches to Reference

July 2003 LSA 3

Today’s lectureToday’s lecture

A formalization of the notion of ‘in focus’: CenteringEvidence for centering:

BehavioralCorpora

Centering-based anaphora resolution

Page 3: Computational Approaches to Reference

July 2003 LSA 4

Theories of salience & focusingTheories of salience & focusing

Fixed number of foci:Sidner’s theoryCentering

Unbounded:Unbounded, but no activation

Strube’s S-List 1998, Henschel Cheng & Poesio

Activation-based:Kantor, Alshawi / Lappin & Leass, Haijcova

Page 4: Computational Approaches to Reference

July 2003 LSA 5

The Grosz and Sidner theory of The Grosz and Sidner theory of discoursediscourse

Central idea: COHERENCE and SALIENCE go hand in hand Reintroduce the idea of a separation between `local’ and `global’ bal’ aspects of coherence and salience (the LOCAL FOCUS and GLOBAL FOCUS) from Grosz, 1977Two separate theories for each component:

Global focus: Grosz and Sidner, 1986Local focus: Centering theory (Grosz, Joshi and Weinstein, 1983, 1995)

Massimo Poesio:

Gordon survey makes it quite clear that the `local focus’ plays here a role similar to that of Short Term Memory (STM) in the Kintsch and van Dijk model – but interestingly, Gordon himself seems to assume that it’s the stack that corresponds to the STM! Perverse ??? Maybe we should discuss Guindon’s model in chapter centering, as an alternative to the idea of the local focus as a CF list? (Just as Walker’s cache model would be an alternative to the stack model?)

Massimo Poesio:

Gordon survey makes it quite clear that the `local focus’ plays here a role similar to that of Short Term Memory (STM) in the Kintsch and van Dijk model – but interestingly, Gordon himself seems to assume that it’s the stack that corresponds to the STM! Perverse ??? Maybe we should discuss Guindon’s model in chapter centering, as an alternative to the idea of the local focus as a CF list? (Just as Walker’s cache model would be an alternative to the stack model?)

Page 5: Computational Approaches to Reference

July 2003 LSA 6

The Global FocusThe Global Focus

At this level of discourse organization, Coherence has to do with INTENTIONAL STRUCTURE, I.e., a discourse is perceived as GLOBALLY COHERENT if the intentions expressed by its constituents are relatedATTENTIONAL STRUCTURE is about situations rather than events: GLOBAL ATTENTION is on FOCUS SPACES, subsets of the global knowledge base

Three levels of discourse structure:LINGUISTIC STRUCTURE (cfr. Van Dijk and Kintsch’s `linguistic structure’INTENTIONAL STRUCTURE: intentions associated with segments together with their relations, DOMINANCE and SATISFACTION-PRECEDESATTENTIONAL STRUCTURE: a stack of FOCUS SPACES, each associated with an intention, and whose position reflects the relations among intentions

Massimo Poesio:

Focus stack not necessarily situation-based

Massimo Poesio:

Focus stack not necessarily situation-based

Page 6: Computational Approaches to Reference

July 2003 LSA 7

Example: the Grosz 1977 `tent’ storyExample: the Grosz 1977 `tent’ story

(1) P1: I’m going camping next week-end. Do you have a two-person tent I could borrow?

(2) P2: Sure. I have a two-person backpacking tent.

(3) P1: The last trip I was on there was a huge storm.

(4) It poured for two hours.

(5) I had a tent, but got soaked anyway.

(6) P2: What kind of tent was it?

(7) P1: A tube tent.

(8) P2: Tube tents don’t stand well in a real storm.

(9) P1: True.

Page 7: Computational Approaches to Reference

July 2003 LSA 8

Example: the Grosz 1977 `tent’ storyExample: the Grosz 1977 `tent’ story

(1) P1: I’m going camping next week-end. Do you have a two-person tent I could borrow?

(2) P2: Sure. I have a two-person backpacking tent.

(10) P2: Where are you going on this trip?

(11) P1: Up in the Minarets.

(12) P2: Do you need any other equipment?

(13) P1: No.

(14) P2: OK. I’ll bring the tent in tomorrow.

Page 8: Computational Approaches to Reference

July 2003 LSA 9

Intentional structureIntentional structure

DSP1: P1 intend to get tent from P2

DSP2: P1 explains why P1 needs tent

DOMINATE

Massimo Poesio:

Note how the idea that discourse structure is determined by intentions is different from ideas like kintsch and van Dijk and more in general of ‘situational models’ or ‘event structure’ (cfr. Gordon survey)

Massimo Poesio:

Note how the idea that discourse structure is determined by intentions is different from ideas like kintsch and van Dijk and more in general of ‘situational models’ or ‘event structure’ (cfr. Gordon survey)

Page 9: Computational Approaches to Reference

July 2003 LSA 10

Intentional structureIntentional structure

DSP1: P1 intend to get tent from P2

Page 10: Computational Approaches to Reference

July 2003 LSA 11

The Focus Space StackThe Focus Space Stack

X1 S1

tent(X1)

S1:of(P2,X1)

DSP1: P1 intend to get tent from P2

X2 E1 S2

tube-tent(X2)

S2:of(P1,X2)

E1:washed-up(X2)DSP2: P1 explains why P1 needs tent

DOMINATE

Page 11: Computational Approaches to Reference

July 2003 LSA 12

The Focus Space StackThe Focus Space Stack

X1 S1 E3 X3 X4 E4

tent(X1)

S1:of(P2,X1)

Minarets(X3)

E3:go(P1,X3)

tent(X4)

X4=?

E4:bring(P2,X4)

DSP1: P1 intend to get tent from P2

Page 12: Computational Approaches to Reference

July 2003 LSA 13

Other formalizations of the global Other formalizations of the global focusfocus

Reichman’s ‘context space model’ (1981, 1985)Context spaces very similar to focus spaces, but with levels of activationRicher repertoire of relations

Walker’s cache model (1996, 1998)Replace stack with cache

Page 13: Computational Approaches to Reference

July 2003 LSA 14

Some evidenceSome evidence

Clearest evidence for distinction between global focus and local focus: the Clark and Sengul’s experiments discussed in Lecture 1Evidence that discourses have a `global organization’ and that discourse segments (and associated episodes) become unaccessible:

Experiments by Anderson et al 1983 suggesting that `temporally closed’ situations become unaccessibleLesgold, Roth, and Curtis, 1979Vonk, Hustin and Simmons 1992Corpus work:

Grosz’ own workChafe 1979’s analysis of the `pear stories’

Evidence relevant to the claim that attentional state is a stack:O’Brien, 1987

But there is also evidence that antecedents which are ‘too far’ are not accessible any longer (Walker, 1998; O’Brien et al, 1997)

Massimo Poesio:

Check Rosemary’s notes: situation is a bit nuanced

Good discussion in Garnham, p. 88-91 (although some of the experiments he mentions do not seem terribly relevant)

Even better discussion in Gordon survey. But is there any evidence supporting idea of intentional structure as opposed to event structure? (All the evidence mentioned here is from even structure, as is old work by Garrod and Sanford). Gordon mentions correlation with prosody and cue phrases – perhaps Vonk et al?

Should also mention that there is real question whether this global structure can be reliably identified

Perhaps even mention work with Barbara?

And what about ideas from Ali etc that global discourse is entity-structured in certain genres?

Massimo Poesio:

Check Rosemary’s notes: situation is a bit nuanced

Good discussion in Garnham, p. 88-91 (although some of the experiments he mentions do not seem terribly relevant)

Even better discussion in Gordon survey. But is there any evidence supporting idea of intentional structure as opposed to event structure? (All the evidence mentioned here is from even structure, as is old work by Garrod and Sanford). Gordon mentions correlation with prosody and cue phrases – perhaps Vonk et al?

Should also mention that there is real question whether this global structure can be reliably identified

Perhaps even mention work with Barbara?

And what about ideas from Ali etc that global discourse is entity-structured in certain genres?

Page 14: Computational Approaches to Reference

July 2003 LSA 16

The local discourse levelThe local discourse level

Whereas the global focus theory from Grosz and Sidner 1986 is meant to characterize INTERSEGMENTAL coherence and salience, Centering is meant to characterize INTRASEGMENTAL coherence and salienceThe first claim is that what matters most at this level is ENTITY COHERENCE: discourse segments in which successives utterances keep mentioning the same utterances are perceived to be more coherent than discourse segments in which different entities are mentioned each timeA second important claim is that each utterance has a main CENTER, or CB, and that utterances whose CB is the same as the previous one are easier to processA third claim is that the entities mentioned by an utterance (`realized’) are RANKED (cfr. Sidner’s ordering of DFLs). This ranking determines the CB of subsequent utterances, and changes in ranking also make utterances more difficult to process.

Massimo Poesio:

Cfr. Knott, Oberlander and Mellish entity coherence as a global organizing principle?

Massimo Poesio:

Cfr. Knott, Oberlander and Mellish entity coherence as a global organizing principle?

Page 15: Computational Approaches to Reference

July 2003 LSA 17

The local focus: CenteringThe local focus: Centering

Centering is often presented as a development of Sidner, but in fact it is radically different in outlook and fairly different in its details as wellUnlike Sidner’s theory, Centering (Joshi and Weinstein, 1979; Grosz, Joshi and Weinstein, 1983; Grosz, Joshi and Weinstein, 1995) is more of a `linguistic’ theory than a computational one: its primary aim is to develop a vocabulary for talking about local salience and coherence, rather than specific algorithmsThe precise specification of many of the central concepts (‘ranking’, ‘utterance’, ‘realization’) is left for further research – indeed, it has been claimed that these concepts may be instantiated in different ways in different languages (Walker et al, 1994)

Page 16: Computational Approaches to Reference

July 2003 LSA 18

Ranking and local coherenceRanking and local coherence

Grosz et al (1983, 1995): texts that do not have a clear ‘central entity’ feel less coherent

(1) a. John went to his favorite music store to buy a piano.

b.  He had frequented the store for many years.

c.   He was excited that he could finally buy a piano.

d.  He arrived just as the store was closing for the day.

(2) a. John went to his favorite music store to buy a piano.

b. It was a store John had frequented for many years.

c. He was excited that he could finally buy a piano.

d. It was closing just as John arrived.

Page 17: Computational Approaches to Reference

July 2003 LSA 19

Local salience and pronominalizationLocal salience and pronominalization

Grosz et al (1995): the CB is also the most salient entity. Texts in which other entities are pronominalized are less felicitous

(1) a. Something must be wrong with John.

b.  He has been acting quite odd.

c.   He called up Mike yesterday.

d.  John wanted to meet him quite urgently.

(2) a. Something must be wrong with John.

b. He has been acting quite odd.

c. He called up Mike yesterday.

d. He wanted to meet him quite urgently.

Page 18: Computational Approaches to Reference

July 2003 LSA 20

Uniqueness of the centerUniqueness of the center

Grosz et al (1995) argue against Sidner that utterances have a single CB.

(1) a. Susan gave Betsy a pet hamster.

b.  She reminded her that such hamsters were quite shy.

c. She asked Betsy whether she liked the gift.

d. Betsy told her that she really liked the gift.

f. She told Susan that she really liked the gift.

e. Susan asked her whether she liked the gift.

Massimo Poesio:

NB: the one bit that Sidner does not predict is a contrast between c. and e. In the cases d. and f., we have a pronoun in AGENT position referring to an entity in non-AGENT position, and viceversa, which could be claimed to result in processing difficulties.

Sidner would also claim that all the pronouns in AGENT position are ambiguous (although not clear what she does with ambiguity)

Note also that according to Strube, both entities would be equally ranked.

Massimo Poesio:

NB: the one bit that Sidner does not predict is a contrast between c. and e. In the cases d. and f., we have a pronoun in AGENT position referring to an entity in non-AGENT position, and viceversa, which could be claimed to result in processing difficulties.

Sidner would also claim that all the pronouns in AGENT position are ambiguous (although not clear what she does with ambiguity)

Note also that according to Strube, both entities would be equally ranked.

Page 19: Computational Approaches to Reference

July 2003 LSA 21

Concepts and definitionsConcepts and definitions

Every UTTERANCE U in a discourse (segment) DS updates the local attentional state, or local focus, which consists of a PARTIALLY RANKED set of discourse entities, or FORWARD-LOOKING CENTERS (CFs)An utterance U in discourse segment DS updates the existing set of forward-looking centers by replacing it with the set of CFs REALIZED in U, CF(U,DS) (usually simplified to CF(U))The most highly ranked CF realized in utterance U is CP(U)

(1) u1. Susan gave James a pet hamster.

CF(u1) = [Susan,James,pet hamster]. CP(u1) = Susan

(2) u2. She gave Peter a nice scarf.

CF(u2) = [Susan,Peter,nice scarf]. CP(u2) = Susan

Massimo Poesio:

Add examples of utterances and CFs!

Massimo Poesio:

Add examples of utterances and CFs!

Page 20: Computational Approaches to Reference

July 2003 LSA 22

The CB: ExamplesThe CB: Examples

(1) u1. Susan gave James a pet hamster.

CF(u1) = [Susan,James,pet hamster]. CB = undefined CP=Susan

(2) u2. She gave Peter a nice scarf.

CF(u2) = [Susan,Peter,nice scarf]. CB=Susan. CP=Susan

NB: The CB is not always the most ranked entity of the PREVIOUS utteranceNB: The CB is not always the most ranked entity of the PREVIOUS utterance

(2’) u2. He loves hamsters.

CF(u2) = [James]. CB=James. CP=James

… … or the most highly ranked entity of the CURRENT oneor the most highly ranked entity of the CURRENT one

(2’’) u2. Peter gave her a nice scarf.

CF(u2) = [Peter,Susan, nice scarf]. CB=Susan. CP=Peter

Page 21: Computational Approaches to Reference

July 2003 LSA 23

TransitionsTransitions

Grosz et al proposed that the load involved in processing an utterance depends on whether that utterance preserves the CB of the previous utterance or not, and on whether CB(U) is also CP(U). They introduce the following classification: CENTER CONTINUATION: Ui is a continuation if CB(Ui) = CB(Ui-1), and CB(Ui) = CP(Ui)

CENTER RETAIN: Ui is a retain if CB(Ui) = CB(Ui-1), but CB(Ui) is different from CP(Ui)

CENTER SHIFT: Ui is a shift if CB(Ui) ≠ CB(Ui-1

Page 22: Computational Approaches to Reference

July 2003 LSA 24

Utterance classificationUtterance classification

(0) u0. Susan is a generous person.

CF(u0) = [Susan] CB = undefined CP = Susan.

(1) u1. She gave James a pet hamster.

CF(u1) = [Susan,James,pet hamster]. CB = Susan CP=Susan

(2) u2. She gave Peter a nice scarf.

CF(u2) = [Susan,Peter,nice scarf]. CB=Susan. CP=Susan CONTINUE

SHIFT:SHIFT:

(2’) u2. He loves hamsters.

CF(u2) = [James]. CB=James. CP=James SHIFT

RETAIN: RETAIN:

(2’’) u2. Peter gave her a nice scarf.

CF(u2) = [Peter,Susan, nice scarf]. CB=Susan. CP=Peter RETAIN

CONTINUE:CONTINUE:

Massimo Poesio:

Note that you need to establish the CB first – see Walker et al

1994, Kameyama 1998, etc.

Massimo Poesio:

Note that you need to establish the CB first – see Walker et al

1994, Kameyama 1998, etc.

Page 23: Computational Approaches to Reference

July 2003 LSA 25

Main claimsMain claims

CONSTRAINT 1: All utterances of a segment except for the first have exactly one CB

RULE 1: if any CF is pronominalized, the CB is.

RULE 2: (Sequences of) continuations are preferred over (sequences of) retains, which are preferred over (sequences of) shifts.

Page 24: Computational Approaches to Reference

July 2003 LSA 26

Violations of the claimsViolations of the claims

A violation of Rule 1

Violations of Constraint 1

(1) a. Something must be wrong with John.

b.  He has been acting quite odd. CB=John

c.   He called up Mike yesterday. CB=John

d.  John wanted to meet him quite urgently. CB=John

(1) a. Something must be wrong with John.

b.  He has been acting quite odd.

c.   He called up Mike yesterday.

d.  It must have been four o’clock in the morning. CB=undef

(1) a. Something must be wrong with John.

b.  He has been acting quite odd.

c.   He and Susan had a fight yesterday.

d.  He didn’t want her to go to the party. CB=John, CB=Susan

Massimo Poesio:

Emphasize what the claims say: these are preferences that make a

text easier or harder to read!

Massimo Poesio:

Emphasize what the claims say: these are preferences that make a

text easier or harder to read!

Page 25: Computational Approaches to Reference

July 2003 LSA 27

The parameters of the theoryThe parameters of the theory

Grosz et al do not provide algorithms for computing any of the notions used in the basic definitions:

UTTERANCEPREVIOUS UTTERANCEREALIZATIONRANKINGWhat counts as a ‘PRONOUN’ for the purposes of Rule 1? (Only personal pronouns? Or demonstrative pronouns as well? What about second person pronouns?)

One of the reasons for the success of the theory is that it provides plenty of scope for theorizing …

Page 26: Computational Approaches to Reference

July 2003 LSA 28

The CBThe CB

A second CF is singled out as BACKWARD-LOOKING CENTER, CB – Centering’s implementation of the notion of ‘topic’ or, better, ‘main character’ in the sense of Garrod and Sanford (1988)Originally, the CB was only characterized in intuitive terms. Ever since Grosz, Joshi and Weinstein (1986, 1995), the CB has been DEFINED as follows:

Note however that other characterizations of CB have been proposed – e.g., by Gordon et al (1993) and Passonneau (1993).

CONSTRAINT 3: CB(Ui) is the highest-ranked element of CF(Ui-1) that is realized in Ui

Page 27: Computational Approaches to Reference

July 2003 LSA 29

Utterance and Previous UtteranceUtterance and Previous Utterance

Originally, utterances implicitly identified with sentences. Later, however, Kameyama (1998) and others suggested to identify utterances with finite clauses. If utterances are identified with sentences, the previous utterance is generally easy to identify (except for texts with titles, etc.) But if utterances are identified with finite clauses, there are various ways of dealing with cases like:

(u1) John wanted to leave home (u2) before Bill came home. (u3) He would be drunk as usual.KAMEYAMA: PREV(u3) = u2.SURI and MCCOY: PREV(u3) = u1

Page 28: Computational Approaches to Reference

July 2003 LSA 30

RealizationRealization

A basic question is whether entities can be ‘indirectly’ realized in utterances by an associate (as in Sidner’s algorithm)

(u1) John walked towards the house.(u2) THE DOOR was open.

A second question is whether first and second person entities are realized:

(u1) Before you buy this medicine,(u2) you should contact your doctor.

Realization greatly affects Constraint 1.

Page 29: Computational Approaches to Reference

July 2003 LSA 31

RankingRanking

The most studied parameterGRAMMATICAL FUNCTION (Kameyama 1986, Grosz Joshi and Weinstein 1986, Brennan et al 1987, Hudson et al 1986, Gordon et al 1993):

SUBJ < OBJ < OTHERSA student was here to see John today: A STUDENT < JOHN

INFORMATION STATUS (Strube and Hahn, 1999):HEARER-OLD < MEDIATED < HEARER-NEWA student was here to see John today: JOHN < A STUDENT

THEMATIC ROLES (Cote, 1998)FIRST MENTION / LINEAR ORDER (Rambow, 1993; Gordon et al, 1993)

In Lisa’s opinion, John shouldn’t have done that

Also, the one parameter that is supposed to vary across languages

E.g., ranking in Japanese and Turkish have been claimed to have additional functions (Walker et al 1994; Turan, 1995)

Massimo Poesio:

A problem for Strube: how do you account for all that evidence about subject assignment in English?

Massimo Poesio:

A problem for Strube: how do you account for all that evidence about subject assignment in English?

Page 30: Computational Approaches to Reference

July 2003 LSA 33

Empirical evaluations of Centering Empirical evaluations of Centering theorytheory

Some of the evidence about the general architecture of focusing discussed in connection with Sidner’s theory relevant for Centering as well

In particular, evidence concerning interaction with commonsense knowledge, and concerning serial vs. parallel

Constraint 1: supported by work such as Ehrlich &Johnson Laird (1982)Rule 1: Quite a lot of psychological results, whose connection with Centering is, however, not always so direct:

Hudson, Tanenhaus, and Dell, 1986Gordon et al, 1993 and subsequent papers (Gordon and Chan, 1995; Gordon and Scearce, 1995; Gordon et al, 1999)Brennan, 1995

Rule 2: no evidence for these preferences (e.g., Gordon et al 1993, Gordon and Scearce 1995)Several algorithms based on centering theory have been proposed (Brennan et al, 1987; Strube and Hahn, 1999; Tetreault, 2001) and evaluated using annotated corporaPoesio et al, 2000, 2002: evaluate the claims of the theory by trying various possible parameter settings and finding the one which minimizes the violations of the claims

Massimo Poesio:

On constraint 1: reference to Ehrlich and Johnson-Laird in Gordon et al 1993. Gordon survey also mentions Kintsch, Kozminsky, Streby, McKoon and Keenan 1975, and Manelis and Yekovich 1976 (.p 34)

Gordon also sees the 1993 experiments as relevant for Constraint 1

For Strube / Hahn ranking: Sanford Moar and Garrod and our naming paper

Massimo Poesio:

On constraint 1: reference to Ehrlich and Johnson-Laird in Gordon et al 1993. Gordon survey also mentions Kintsch, Kozminsky, Streby, McKoon and Keenan 1975, and Manelis and Yekovich 1976 (.p 34)

Gordon also sees the 1993 experiments as relevant for Constraint 1

For Strube / Hahn ranking: Sanford Moar and Garrod and our naming paper

Page 31: Computational Approaches to Reference

July 2003 LSA 34

Ranking: Hudson, Tanenhaus, and Ranking: Hudson, Tanenhaus, and DellDell

Hudson, Tanenhaus, and Dell, 1986 ran some reading time experiments using materials as follows, in which `Jack’ is made more highly ranked in a. by being the subject and by choosing an NP1 verb

Results: RT(b1) << RT(b2) = RT(b4) << RT(b3) which violates Rule 1

a. Jack apologised profusely to Josh.

b1. He had been rude to Josh yesterday.

b3. He had been offended by Jack’s comment.

b2. Jack had been rude to Josh yesterday.

b4. Josh had been offended by Jack’s comment.

Massimo Poesio:

Note that the first entity is most highly ranked both by subject and by implicit causality verbs

Massimo Poesio:

Note that the first entity is most highly ranked both by subject and by implicit causality verbs

Page 32: Computational Approaches to Reference

July 2003 LSA 35

Some commentsSome comments

Notice that while Centering predicts problems with b3, it does not predict the faster reading times for b1 (see also discussion of Gordon et al’s experiments, next)In fact, in order to claim that the slow reading time for b3 is consistent with Rule 1, we have to assume that when the subject encounters the pronouns she already knows what the CB is going to beFurthermore, the materials do not really allow us to tell whether the effect here is due to a true focusing effect, or it’s only a subject assignment preference Finally, no difference in this case between the predictions according to a theory of ranking based on grammatical function and that proposed by Strube and Hahn (which uses linear order to break ties)

Massimo Poesio:

Remark on the need for an incremental version of centering

to evaluate these claims (at least two exists; see also Kehler,

1997)

Massimo Poesio:

Remark on the need for an incremental version of centering

to evaluate these claims (at least two exists; see also Kehler,

1997)

Page 33: Computational Approaches to Reference

July 2003 LSA 36

Ranking and pronominalization: Ranking and pronominalization: Gordon, Grosz and Gilliom, 1993Gordon, Grosz and Gilliom, 1993

A series of reading time studies that revealed a REPEATED NAME PENALTY (RNP): an increased reading time when a proper name is used instead of a pronoun

PRO-PRO:

(1) a. Bruno was the bully of the neighborhood.

b.  He chased Tommy all the way home one day.

c.   He watched him hide behind a big tree and start to cry.

d.  He yelled at him so loudly that all the neighbors came outside.

PRO-NAME:

(1) a. Bruno was the bully of the neighborhood.

b.  He chased Tommy all the way home one day.

c.   He watched Tommy hide behind a big tree and start to cry.

d.  He yelled at Tommy so loudly that all the neighbors came outside.

Page 34: Computational Approaches to Reference

July 2003 LSA 37

Ranking and pronominalization: Ranking and pronominalization: Gordon, Grosz and Gilliom, 1993Gordon, Grosz and Gilliom, 1993

NAME-PRO:

(1) a. Bruno was the bully of the neighborhood.

b.  Bruno chased Tommy all the way home one day.

c.   Bruno watched him hide behind a big tree and start to cry.

d.  Bruno yelled at him so loudly that all the neighbors came outside.

Page 35: Computational Approaches to Reference

July 2003 LSA 38

Repeated Name PenaltyRepeated Name Penalty

7581 7624

8460

7000

7500

8000

8500

9000

Condition

Read. time (passages)

Pro-Pro 7581

Pro-Name 7624

Name-Name 8460

Page 36: Computational Approaches to Reference

July 2003 LSA 39

Ranking and pronominalization: Ranking and pronominalization: entities subject to RNPentities subject to RNP

Gordon et al only observed a RNP for entities in SUBJECT position referring to either the FIRST MENTIONED or SUBJECT of the previous utterance (Exp. 2 and 3)

Gordon et al: these results support both Costraint 1 and Rule 1They suggest to replace the definition of CB with one based on the RNP

A difference: (2b) above

(2) a. Lisa gave Fred a pet hamster. b.  In her / Lisa’s opinion, an hamster was the best present for him/Fred.c.   In his / Fred’s opinion, She/Lisa shouldn’t have done that.

Massimo Poesio:

Exp 2 for b, exp 3 for c

Add data supporting claim that first mention and subject same

ranking?

Massimo Poesio:

Exp 2 for b, exp 3 for c

Add data supporting claim that first mention and subject same

ranking?

(1) a. Susan gave Fred a pet hamster. b.  In his / Fred’s opinion, she/Susan shouldn’t have done that.c. She/Susan just assumed that anyone would love a hamster.c’. He/Fred doesn’t have anywhere to put a cage.

Page 37: Computational Approaches to Reference

July 2003 LSA 40

A few commentsA few comments

The RNP is a very interesting result, but a lot of people wonder whether it really makes sense to take it as a verification of Centering, at least in its `classical’ version

Gordon et al didn’t find RNP for entities that would be CBs according to the original definition

Even if we accept Gordon et al’s suggestion that we should modify the theory by dropping the definition in Constraint 3 and adopting the RNP as an operational test for the CB, we would still need to modify Rule 1 – which in its classical form does not REQUIRE the CB to be pronominalized.

Page 38: Computational Approaches to Reference

July 2003 LSA 42

Other Gordon experimentsOther Gordon experiments

Gordon and Scearce, 1995: focusing generates hypotheses independently from commonsense knowledgeGordon and Chan, 1995: ranking depends on subjecthood rather than agenthoodGordon et al, 1999: ranking in complex NPs (coordinated NPs, possessive NPs) depends on structural factors rather than linear order

Massimo Poesio:

Skip this for now

Check Garnham and survey paper by Gordon

Massimo Poesio:

Skip this for now

Check Garnham and survey paper by Gordon

Page 39: Computational Approaches to Reference

July 2003 LSA 43

Corpus-based evaluationCorpus-based evaluation

Notions from Centering used in a number of studies, especially of the connection between status in the local focus an NP formPassonneau (1993): comparison of uses of IT and THAT

IT primarily used to refer to LOCAL CENTERsTHAT to entities which are not local centers

Di Eugenio (1992, 1998): `weak’ vs. `strong’ pronouns in Italian

`weak’ pronouns used to maintain CB`strong’ pronouns for shifting

Page 40: Computational Approaches to Reference

July 2003 LSA 44

Poesio et al 2000, submitted: A Poesio et al 2000, submitted: A corpus-based evaluation of Centeringcorpus-based evaluation of Centering

Using the GNOME corpus to compare ‘parameter configurations’ using the number of violations of Constraint 1, Rule 1, and Rule 2 as metrics Can be used on-line:http://cswww.essex.ac.uk/staff/poesio/cbc

Page 41: Computational Approaches to Reference

July 2003 LSA 45

The trade-off between Constraint 1 The trade-off between Constraint 1 and Rule 1: Utterance parametersand Rule 1: Utterance parameters

49.446.5 47 48.2 49.7

47.1

38.93.33.6

2.42.8 2.8 2.7

4.5

0

10

20

30

40

50

60

vanil

la

coor

d-vp

verb

ed

vanil

la-

no-re

lativ

esu

ri s

Utterance parameters

Pe

rce

nta

ge

vio

lati

on

s C

1

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Pe

rce

nta

ge

vio

lati

on

s R

1

Series1

Series2

Page 42: Computational Approaches to Reference

July 2003 LSA 46

Algorithms based on centering Algorithms based on centering theorytheory

Anaphora resolution:Brennan, Friedman and Pollard, 1987 (BFP)‘Basic algorithm’, Strube and Hahn 1999Incremental algorithms: Strube, 1998; Tetreault, 1999, 2001

Generation:Text planning: Kibble and Power, 2000; Karamanis, 2001, 2002NP realization: Henschel, Cheng, and Poesio, 2000

A number of evaluation studies:Walker, 1989 (BFP vs. Hobbs)Strube and Hahn, 1999 (SH vs. BFP)Strube, 1998 (S-LIST vs BFP)Tetreault, 2001 (History List vs. Hobbs vs. BFP vs S-LIST vs. LRC)

Massimo Poesio:

Have to add here discussion of Tetreault

Massimo Poesio:

Have to add here discussion of Tetreault

Page 43: Computational Approaches to Reference

July 2003 LSA 47

Brennan et al (1987)Brennan et al (1987)

The first and, arguably, still best-known algorithm for pronoun resolution based on Centering was proposed by Brennan, Friedman and Pollard (1987)Parameter configuration:

Utterances: sentencesRanking: grammatical function Realization: direct??

Along the way, Brennan et al also developed the formalization of Centering which is best-know

E.g., the terminology of ‘Constraints’ and ‘Rules’, or the division of ‘Shifts’ into ‘Smooth shift’ and ‘Rough Shifts’

Page 44: Computational Approaches to Reference

July 2003 LSA 48

The algorithmThe algorithm

1. GENERATE possible Cb-Cf combinations (or anchors) 2. FILTER these anchors by constraints:

a. Binding theory,b. sortal predicatesc. Centering rules and constraints

3. RANK the remaining anchors according to transition preferences: CONTINUE < RETAIN < SMOOTH-SHIFT < ROUGH-SHIFT

Page 45: Computational Approaches to Reference

July 2003 LSA 49

An exampleAn example

(1) u1. Terry really goofs sometimes.

(2) u2. Yesterday was a beautiful day and he was excited about trying out his new sailboat.

(3) u3. He wanted TonyTony to join him on a sailing expedition.

(4) u4. He called himhim at 6AM.

(5) u5. He was sick and furious at being woken up so early.

Page 46: Computational Approaches to Reference

July 2003 LSA 50

Analysis of the exampleAnalysis of the example

(1) u1. Terry really goofs sometimes.

CB = NIL CF = [Terry]

(2) u2. Yesterday was a beautiful day and he was excited about trying out his new sailboat.

Referring expressions = [yesterday, A1, A2, the sailboat]Possible CF lists: [yesterday, Terry, Terry, the sailboat]Anchors = a1. <Terry, [yesterday, Terry, Terry, the sailboat]>

a2. <NIL, [yesterday, Terry, Terry, the sailboat]>Filter out a2.CB = Terry

CF = [yesterday, Terry, Terry, the sailboat]transition = ESTABLISH / CONTINUE

Page 47: Computational Approaches to Reference

July 2003 LSA 51

Example, cont’dExample, cont’d

(3) u3. He wanted TonyTony to join him on a sailing expedition.

CB = TerryCF = [Terry, Tony, Terry, a sailing expedition]

(4) u4. He called himhim at 6AM.

Referring expressions = [A3, A4]Possible CF lists: [Terry, Terry]

[Terry, Tony][Tony, Terry][Tony, Tony]

Anchors = a1. <Terry, [Terry, Terry]> a2. <NIL, [Terry, Terry]> a3. <Tony, [Terry, Terry]> …..

Filter out all anchors except for <Terry, [Terry, Tony]> (CONTINUE) and <Terry, [Tony, Terry]> (RETAIN)

CB = Terry CF = [Terry, Tony]

transition = CONTINUE

Page 48: Computational Approaches to Reference

July 2003 LSA 52

Example, endExample, end

(5) u5. He was sick and furious at being woken up so early.

Referring expressions: [A1]Possible CF lists: [Terry], [Tony]Possible anchors:

<Terry, [Terry]> (CONTINUE)<Tony, [Tony]> (SMOOTH-SHIFT)

CB = TerryCF = [Terry]

Page 49: Computational Approaches to Reference

July 2003 LSA 54

A more complex exampleA more complex example

(1) u1. Susan gave Betsy a pet hamster.

CB = NIL CF = [Susan, Betsy]

(2) u2. She reminded her that such hamsters were quite shy.

Referring expressions = [A1, A2, hamsters]Possible CF lists: [Susan, Betsy, hamsters],

[Betsy, Susan, hamsters]Anchors = a1. <Susan, [Susan, Betsy]>

a2. <Betsy, [Susan, Betsy]> a3. <NIL, [Susan, Betsy]> a4. <Susan, [Betsy, Susan]> a5. <Betsy, [Betsy, Susan]> a6. <NIL, [Betsy, Susan]>

Filter out a2, a3, a5, a6.

Page 50: Computational Approaches to Reference

July 2003 LSA 55

Kehler, 1997Kehler, 1997

(1) u1. Terry gets really angry sometimes.

(2) u2. Yesterday was a beautiful day and he was excited about trying out his new sailboat.

(3) u3. He wanted TonyTony to join him on a sailing expedition, and left himhim a message on hishis answering machine.

(4) u4. Tony Tony called him at 6AM the next morning. (RETAIN)

(5) u5. He was furious for being woken up so early. He = Terry: CONTINUE. He = Tony: SMOOTH-SHIFT.

(5’) u5. He was furious with him for being woken up so early. (CB = Tony) He = Tony, him = Terry: SMOOTH-SHIFT He = Terry, him = Tony: ROUGH-SHIFT

(5’’) u5. He was furious with Tony for being woken up so early. (CB = Tony) He = Terry: violation of Rule 1

Page 51: Computational Approaches to Reference

July 2003 LSA 56

Tetreault 2001- LRCTetreault 2001- LRC

1. Process left-to-right all references to discourse entities in utterance Un. When a pronoun is encountered,

a. Search for an antecedent intrasententially in the list of all processed CFs in Un that meet feature and binding constraints.

b. If none is found, search for an antecedent intersententially in CF(Un-1) that satisfies agreement and binding constraints.

2. Create CF(Un) by ranking its discourse entities according to grammatical function. (In the implementation, this ranking is approximated by a left-to-right, breadth-first walk of the parse tree.)

3. Compute CB(Un)

4. Compute the transition.

Page 52: Computational Approaches to Reference

July 2003 LSA 57

Tetreault’s Evaluation of Pronoun Tetreault’s Evaluation of Pronoun Resolution AlgorithmsResolution Algorithms

Algorithm Right % right % intra % inter

BFP 1004 59.4 75.1 48.0

S-list 1211 71.7 74.1 67.5

LRC order

1266 74.7 72.0 81.6

LRC GF 1268 74.9 72.0 82.0

Hobbs 1298 76.8 74.2 82.0

LRC toback

1362 80.4 77.7 87.3

Massimo Poesio:

These are the figures for the NYT

Massimo Poesio:

These are the figures for the NYT

Page 53: Computational Approaches to Reference

July 2003 LSA 58

Readings & ReferenceReadings & Reference

Brennan, S., Friedman, M. and Pollard, C. 1987. A Centering approach to pronouns. Proc. of the 25th ACL, p. 155-162.Gordon, P. C., Grosz, B. J., and Gilliom, L. A. 1993. Pronouns, names, and the centering of attention in discourse. Cognitive Science, 17(3), 311-347.Grosz, B., Joshi, A., and Weinstein, S. 1995. Centering: a framework for modeling the local coherence of discourse. Computational Linguistics, 21(2). [required reading]Grosz, B. and Sidner, C. 1986. Attention, Intention, and the Structure of Discourse. Computational Linguistics. Kehler, A. 1997. Current theories of Centering for pronoun interpretation. Computational Linguistics, 23(3).Tetreault, J. 2001. A corpus-based evaluation of Centering and anaphora resolution. Computational Linguistics, 27(4). [required reading]