SIMS 290-2: Applied Natural Language Processing

1

SIMS 290-2: Applied Natural Language Processing

Marti HearstDecember 1, 2004

2

TodayDiscourse Processing

Going beyond the sentence

CharacteristicsCohesion / coherenceGiven / newRhetorical structure

Issues:Segmentation

– Linear– Hierarchical– Text vs. Dialogue– Discourse cues vs. content change

Co-reference / anaphora resolution

Dialogue Processing

3Adapted from slide by Julia Hirschberg

What makes a text/dialogue coherent?

“Consider, for example, the difference between passages (18.71) and (18.72). Almost certainly not. The reason is that these utterances, when juxtaposed, will not exhibit coherence. Do you have a discourse? Assume that you have collected an arbitrary set of well-formed and independently interpretable utterances, for instance, by randomly selecting one sentence from each of the previous chapters of this book.”

vs….


What makes a text/dialogue coherent?

“Assume that you have collected an arbitrary set of well-formed and independently interpretable utterances, for instance, by randomly selecting one sentence from each of the previous chapters of this book. Do you have a discourse? Almost certainly not. The reason is that these utterances, when juxtaposed, will not exhibit coherence. Consider, for example, the difference between passages (18.71) and (18.72). (J&M:695)


What makes a text coherent?

Discourse/topic structureAppropriate sequencing of subparts of the discourse

Rhetorical structureAppropriate use of coherence relations between subparts of the discourse

Referring expressionsWords or phrases, the semantic interpretation of which is a discourse entity


Information StatusContrast

– John wanted a poodle but Becky preferred a corgi.

Topic/comment – The corgi they bought turned out to have fleas.

Theme/rheme – The corgi they bought turned out to have fleas.

Focus/presupposition – It was Becky who took him to the vet.

Given/new – Some wildcats bite, but this wildcat turned out to be a sweetheart.– Contrast Speaker (S) and Hearer (H)


Entities when first introduced are newBrand-new (H must create a new entity)

I saw a dinosaur today.Unused (H already knows of this entity)

I saw your mother today.

Evoked entities are old -- already in the discourse

Textually evokedThe dinosaur was scaley and gray.

Situationally evokedThe light was red when you went through it.

InferrablesContaining

I bought a carton of eggs. One of them was broken.Non-containing

A bus pulled up beside me. The driver was a monkey.

Determining Given vs. New


Given/New and Definiteness/Indefiniteness

Subject NPs tend to be syntactically definite and oldObject NPs tend to be indefinite and new

I saw a black cat yesterday. The cat looked hungry.– Definite articles, demonstratives, possessives, personal

pronouns, proper nouns, quantifiers like all, every

Indefinite articles, quantifiers like some, any, one signal indefiniteness…but….

This guy came into the room

9

Discourse/Topic Structure

Text Segmentation:Linear

– TextTiling– Look for changes in content words

Hierarchical– Grosz & Sidner’s Centering theory– Morris & Hirst’s algorithm– Lexical chaining through Roget’s thesaurus

Hierarchical + Relations– Mann et al.’s Rhetorical Structure Theory– Marcu’s algorithm

10

TextTiling

Goal: find multi-paragraph topicsExample: 21 paragraph article called Stargazers

11Adapted froms slide by William Yerazunis

TextTiling

Goal: find multi-paragraph topicsBut … it’s difficult to define topic (Brown & Yule)Focus instead on topic shift or changeChange in content, by contrast with setting, scene, charactersMechanism:

compare adjacent blocks of textlook for shifts in vocabulary

12

Intuition behind TextTiling


TextTiling Algorithm

TokenizationLexical Score Determination

BlocksVocabulary IntroductionsChains

Boundary Identification


Tokenization

Convert text stream into terms (words)Remove “stop words”Reduce to root (inflectional morphology)

Subdivide into “token-sequences”(substitute for sentences)

Find potential boundary points (paragraphs breaks)


Determining Scores

Compute a score at each token-sequence gapScore based on lexical occurrencesBlock algorithm:

score iw w

w w

t b t bt

t b t btt

( ), ,

, ,

1 2

1 2

2 2

16


Boundary Identification

Smooth the plot (average smoothing)

Assign depth score at each token-sequence gap“Deeper” valleys score higherOrder boundaries by depth scoreChoose boundary cut off (avg-sd/2)


Evaluation

DATA Twelve news articles from Dialog Seven human judges per article “major” boundaries: chosen by >= 3 judges Avg number of paragraphs: 26.75 Avg number of boundaries: 10 (39%)

RESULTS Between upper and lower bounds Upper bound: judges’ averages Lower bound: reasonable simple algorithm


Assessing Agreement Among Judges

KAPPA Coefficient Measures pairwise agreement Takes expected chance agreement into account P(A) = proportion of times judges agree P(E) = proportion expected chance agreement

.43 to .68 (Isard & Carletta 95, boundaries) .65 to .90 (Rose 95, sentence segmentation) Here, k= .647

kP A P E

P E

( ) ( )

( )1

P E P B P B

P E

( ) ( ( )) ( ( ))

( ) . . .

2 2

2 2

1

39 61 52


TextTiling ConclusionsFirst computational investigation into multi-paragraph discourse unitsSimple Discourse Cue: position-sensitive term repetition Acceptable performance for some tasksHas been reproduced/used by many researchersMulti-lingual

(applied by others to French, German, Arabic)


What Can Hierarchical Structure Tell Us?

Welcome to word processing. That’s using a computer to type letters and reports. Make a typo?

No problem.

Just back up, type over the mistake, and it’s gone.

And, it eliminates retyping.

And, it eliminates retyping.


Centering Theory of Discourse Structure (Grosz & Sidner ‘86)

A prominent theory of discourse structureProvides for multiple levels of analysis: S’s purpose as well as content of utterances and S and H’s attentional stateIdentifies only a few, general relations that hold among intentionsOften leads to a hierarchical structure

Three components:Linguistic structureIntentional structureAttentional structure

23

Example of Hierarchical Analysis(Morris and Hirst ’91)

24


Rhetorical Structure Theory (Mann, Matthiessen, and Thompson ‘89)

One theory of discourse structure, based on identifying relations between parts of the text

Identify meaningful units and the relations between them

– Clauses and clause-like units that are unequivocally the nucleus or satellite of a rhetorical relation.

Only the midday sun at tropical latitudes is warm enough] [to thaw ice on occasions,] [but any liquid water formed in this way would evaporate almost instantly] [because of the low atmospheric pressure.]

Nucleus/satellite notion encodes asymmetry


Rhetorical Structure Theory

Some rhetorical relations:Elaboration (set/member,class/instance/whole/part…)Contrast: multinuclearCondition: Sat presents precondition for NPurpose: Sat presents goal of the activity in NSequence: multinuclearResult: N results from something presented in SatEvidence: Sat provides evidence for something claimed in N

27Adapted from slide by Daniel Marcu

Determining high-level relations

[Smart cards are not a new phenomenon.1] [They have been in development since the late 1970s and have found major applications in Europe, with more than a quarter of a billion cards made so far.2] [The vast majority of chips have gone into prepaid, disposable telephone cards, but even so the experience gained has reduced manufacturing costs, improved reliability and proved the viability of smart cards.3] [International and national standards for smart cards are well under development to ensure that cards, readers and the software for the many different applications that may reside on them can work together seamlessly and securely.4] [Standards set by the International Organization for Standardization (ISO), for example, govern the placement of contacts on the face of a smart card so that any card and reader will be able to connect.5]


Representing implicit relations[Smart cards are becoming more attractive2] [as the price of microcomputing power and storage continues to drop.3] [They have two main advantages over magnetic-stripe cards.4] [First, they can carry 10 or even 100 times as much information5] [- and hold it much morerobustly.6] [Second, they can execute complex tasks in conjunction with a terminal.7]


What’s the Rhetorical Structure?

System: Hello. How may I help you?User: I would like to find out why I was charged for a call?System: What call would you like to inquire about?User: My bill says I made a call to Syncamaloo, Texas, but I’ve never even heard of this town.System: May I have the date of the call that appears on your bill?


Issues for RSTMany variations in expression

[I have not read this book.] [It was written by Bertrand Russell.][I have not read this book,] [which was written by Bertrand Russell.][I have not read this book written by Bertrand Russell.][I have not read this Bertrand Russell book.]

Rhetorical relations are ambiguous[He caught a bad fever] [while he was in Africa.]

– Circumstance > Temporal-Same-Time

[With its distant orbit, Mars experiences frigid weather conditions.] [Surface temperatures typically average about –60 degrees Celsius at the equator and can dip to –123 degrees C near the poles. ]

– Evidence > Elaboration


Identifying RS Automatically (Marcu ’99)

Train a parser on a discourse treebank90 RS trees, hand-annotated for rhetorical relationsElementary discourse units (edu’s) linked by RRParser learns to identify N and S and their RRFeatures: Wordnet-based similarity, lexical, structural

Uses discourse segmenter to identify discourse units

Trained to segment on hand-labeled corpus (C4.5)Features: 5-word POS window, presence of discourse markers, punctuation, seen a verb?,…Eval: 96-8% accuracy


Evaluation of parser:Id edu’s: Recall 75%, Precision 97%Id hierarchical structure (2 edu’s related): Recall 71%, Precision 84%Id nucleus/satellite labels: Recall 58%, Precision 69%Id RR: Recall 38%, Precision 45%

Later errors due mostly to edu mis-identification

Id of hierarchical structure and n/s status comparable to human when hand-labeled edu’s used

Hierarchical structure is easier to id than RR

Identifying RS Automatically (Marcu ’99)


Some Problems with RST (cf. Moore & Pollack ‘92)

How many Rhetorical Relations are there?How can we use RST in dialogue as well as monologue?RST does not allow for multiple relations holding between parts of a discourseRST does not model overall structure of the discourse

34Adapted from slide by Ani Nenkova

Referring Expressions

Referring expressions are words or phrases, the semantic interpretation of which is a discourse entity (also called referent)

Discourse entities are semantic objects . – Can have multiple syntactic realizations within a text

Discourse entities exist in the domain D, in which a text is interpreted


Referring Expressions: Example

A pretty woman entered the restaurant. She sat at the table next to mine and only then I recognized her. This was Amy Garcia, my next door neighbor from 10 years ago. The woman has totally changed! Amy was at the time shy…


Pronouns vs. Full NP



Definite vs. Indefinite NPs



Common Noun vs. Proper Noun



Modified vs. Bare head NP



Premodified vs. Postmodified



Anaphora resolution

Finding in a text all the referring expressions that have one and the same denotation

Pronominal anaphora resolutionAnaphora resolution between named entitiesFull noun phrase anaphora resolution


Anaphora Resolution



Pronominal anaphora resolution

Rule-based vs statistical(Ken 1996), (Lap 1994) vs (Ge 1998)

Performed on full syntactic parse vs on shallow syntactic parse

(Lap 1994), (Ge 1998) vs (Ken 1996)

Type of text used for the evaluation(Lap 1994) computer manual texts (86% accuracy)(Ge 1998) WSJ articles (83% accuracy)(Ken 1996) different genres (75% accuracy)


Pronominal anaphora resolution

Generic vs specific reference1. The Vice-President of the United States is also

President of the Senate.2. Historically, he is the President’s key person in

negotiations with Congress3a. He is required to be 35 years old.3b. As Ambassador to China, he handled many tricky

negotiations, so he is well prepared for the job

45

Talking to a Machine….and (often) Getting an Answer

Today’s spoken dialogue systems make it possible to accomplish real tasks without talking to a personKey advances

Stick to goal-directed interactions in a limited domainPrime users to adopt the vocabulary you can recognizePartition the interaction into manageable stagesJudicious use of system vs. mixed initiative


Acoustic and Prosodic Cues to Discourse Structure

Intuition:Speakers vary acoustic and prosodic cues to convey variation in discourse structureSystematic? In read or spontaneous speech?

Evidence: Observations from recorded corporaLaboratory experimentsMachine learning of discourse structure from acoustic/prosodic features


Boston Directions Corpus (Hirschberg & Nakatani ’96)

Experimental Design– 12 speakers: 4 used– Spontaneous and read versions of 9 direction-giving

tasks

Corpus: 50m read; 67m sponLabeling

Prosodic: ToBI intonational labelingDiscourse: Grosz & Sidner

Features used in analysis


ds1: step 1, enter and get tokenfirstenter the Harvard Square T stopand buy a token

ds2: inbound on red linethenproceed to get on theinboundumRed Lineuh subway

Boston Directions Corpus: Describe how to get to MIT from Harvard


ds3: take subway from hs, to cs to ksandtake the subwayfrom Harvard Squareto Central Squareand then to Kendall Square

– ds4: describe ks stationyou’ll see a music sculpture therewhich will tell you it’s Kendall Squareit’s very nice

ds5: get off T.then get off the T

50

Dialogue vs. Monologue

Monologue and dialogue both involve interpreting

Information statusCoherence issuesReference resolutionSpeech acts, implicature, intentionality

Dialogue involves managingTurn-takingGrounding and repairing misunderstandingsInitiative and confirmation strategies

51

Segmenting Speech into Utterances

What is an `utterance’?Why is EOU detection harder than EOS?How does speech differ from text? Single syntactic sentence may span several turns

A: We've got you on USAir flight 99B: YepA: leaving on December 1.

Multiple syntactic sentences may occur in single turnA: We've got you on USAir flight 99 leaving on December. Do you

need a rental car?

Intonational definitions: intonational phrase, breath group, intonation unit

52

Turns and Utterances

Dialogue is characterized by turn-taking: who should talk next, and when they should talkHow do we identify turns in recorded speech?

Little speaker overlap (around 5% in English --although depends on domain)But little silence between turns either

How do we know when a speaker is giving up or taking a turn? Holding the floor? How do we know when a speaker is interruptable?

53

Simplified Turn-Taking Rule (Sacks et al)

At each transition-relevance place (TRP) of each turn:

If current speaker has selected A as next speaker, then A must speak nextIf current speaker does not select next speaker, any other speaker may take next turnIf no one else takes next turn, the current speaker may take next turn

TRPs are where the structure of the language allows speaker shifts to occur

54

Adjacency pairs set up next speaker expectations

GREETING/GREETINGQUESTION/ANSWERCOMPLIMENT/DOWNPLAYERREQUEST/GRANT

‘Significant silence’ is dispreferredA: Is there something bothering you or not? (1.0s)A: Yes or no? (1.5s)A: Eh?B: No.

55

Turntaking and Initiative Strategies

System InitiativeS: Please give me your arrival city name.U: Baltimore.S: Please give me your departure city name….

User InitiativeS: How may I help you?U: I want to go from Boston to Baltimore on November

8.

`Mixed’ initiativeS: How may I help you?U: I want to go to Boston.S: What day do you want to go to Boston?

56

Grounding (Clark & Shaefer ‘89)

Conversational participants don’t just take turns speaking….they try to establish common ground (or mutual belief)H must ground a S's utterances by making it clear whether or not understanding has occurredHow do hearers do this?

Several different mechanisms

57

S: I can upgrade you to an SUV at that rate.Continued attention

(U gazes appreciatively at S)

Relevant next contributionU: Do you have a RAV4 available?

Acknowledgement/backchannelU: Ok/Mhmmm/Great!

Demonstration/paraphraseU: An SUV.

Display/repetitionU: You can upgrade me to an SUV at the same rate?

Request for repairU: I beg your pardon?

Grounding Mechanisms(Clark & Shaefer ‘89)

58

How do we evaluate Dialogue Systems?

PARADISE framework (Walker et al ’00)

“Performance” of a dialogue system is affected both by what gets accomplished by the user and the dialogue agent and how it gets accomplished

Efficiency of the Interaction:User Turns, System Turns, Elapsed TimeQuality of the Interaction: ASR rejections, Time Out Prompts, Help Requests, Barge-Ins, Mean Recognition Score (concept accuracy), Cancellation RequestsUser SatisfactionTask Success: perceived completion, information extracted

59

Identifying Misrecognitions and User Corrections Automatically (Hirschberg, Litman & Swerts)

Collect corpus from interactive voice response systemIdentify speaker ‘turns’

– incorrectly recognized – where speakers first aware of error – that correct misrecognitions

Identify prosodic features of turns in each category and compare to other turnsUse Machine Learning techniques to train a classifier to make these distinctions automatically

60

Turn Types

TOOT: Hi. This is AT&T Amtrak Schedule System. This is TOOT. How may I help you?

User: Hello. I would like trains from Philadelphia to New York leaving on Sunday at ten thirty in the evening.

TOOT: Which city do you want to go to?

User: New York.

misrecognition

correction

aware site

61

Results

Reduced error in predicting misrecognized turns to 8.64%Error in predicting ‘awares’ (12%)Error in predicting corrections (18-21%)


Dialogue Conclusions

Spoken dialogue systems presents new problems -- but also new possibilities

Recognizing speech introduces a new source of errorsAdditional information provided in the speech stream offers new information about users’ intended meanings, emotional state (grounding of information, speech acts, reaction to system errors)

Why spoken dialogue systems rather than web-based interfaces?

Documents

SIMS 290-2: Applied Natural Language Processing