32
+ Topics in Advance Dialog CS136a Speech Recognition Marie Meteer

Topics in Advance Dialog - Brandeiscs136a/CS136a_Slides/CS136a... · 2018-11-13 · Figure 25.1 Architecture of a dialog-state system for task-oriented dialog from Williams et al

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Topics in Advance Dialog - Brandeiscs136a/CS136a_Slides/CS136a... · 2018-11-13 · Figure 25.1 Architecture of a dialog-state system for task-oriented dialog from Williams et al

+

Topics in Advance DialogCS136a Speech RecognitionMarie Meteer

Page 2: Topics in Advance Dialog - Brandeiscs136a/CS136a_Slides/CS136a... · 2018-11-13 · Figure 25.1 Architecture of a dialog-state system for task-oriented dialog from Williams et al

+Overviewn Dialog Acts

n Dialog State: Interpretationn Sketching an algorithm for dialog Act Interpretationn Special case: Detecting correction acts

n Dialog Policyn Generating Dialog Acts: Confirming and Rejecting

n A simple policy based on local context

n Natural language generation in the dialog-state model

2

Page 3: Topics in Advance Dialog - Brandeiscs136a/CS136a_Slides/CS136a... · 2018-11-13 · Figure 25.1 Architecture of a dialog-state system for task-oriented dialog from Williams et al

+Frame-based dialog agents

n Sometimes called "task-based dialog agents"

n Based on a "domain ontology"n A knowledge structure representing user

intentions

n One or more framesn Each a collection of slotsn Each slot having a value

Page 4: Topics in Advance Dialog - Brandeiscs136a/CS136a_Slides/CS136a... · 2018-11-13 · Figure 25.1 Architecture of a dialog-state system for task-oriented dialog from Williams et al

+The Frame

n A set of slots, to be filled with information of a given type

n Each associated with a question to the user

Slot Type QuestionORIGIN city What city are you leaving from?DEST city Where are you going?

DEP DATE date What day would you like to leave?

DEP TIME timeWhat time would you like to leave?AIRLINE line What is your preferred airline?

Page 5: Topics in Advance Dialog - Brandeiscs136a/CS136a_Slides/CS136a... · 2018-11-13 · Figure 25.1 Architecture of a dialog-state system for task-oriented dialog from Williams et al

+ Frame-based dialogue agents

n 1977:

n Still the industrial state of the artn SIRI based on GUS architecture

ARTIFICIAL INTELLIGENCE 155

GUS, A Frame-Driven Dia|og System Danie l G. Bobrow, Ronald M . Kaplan, Mart in Kay, Donald A. Norman, Henry Thompson and Terry Winograd

Xerox Palo Alto Research Center, 3333 Coyote Hill Road, Palo Alto, CA 94304, U.S.A.

Recommended by Don Walker

ABSTRACT GUS is the first o f a series o f experimental computer systems that we intend to construct as part o f a program of research on language understanding. In large measure, these systems will fill the role o f periodic progress reports, summarizing what we have learned, assessing the mutual coherence o f the various lines o f investigation we have been following, and saggestin# where more emphasis is needed in future work. GUS (Genial Understander System) is intended to engage a sympathetic and highly cooperative human in an English dialog, directed towards a specific goal within a very restricted domain o f discourse. As a starting point, G US was restricted to the role o f a travel agent in a con- versation with a client who wants to make a simple return trip to a single city in California.

There is good reason for restricting the domain o f discourse for a computer system which is to engage in an English dialog. Specializing the subject matter that the system can talk about permiis it to achieve some measure o f realism without encompassing all the possibilities o f human knowledge or o f the English language. It also provides the user with specific motivation for participating in the conversation, thus narrowing the range o f expectations that GUS must have about the user's pur- poses. A system restricted in this way will be more able to guide the conversation within the boundaries o f its competence.

1. Motivation and Design Issues Within its limitations, ous is able to conduct a more-or-less realistic dialog. But the outward behavior of this first system is not what makes it interesting or signifi- cant. There are, after all, much more convenient ways to plan a trip and, unlike some other artificial intelligence programs, (;us does not offer services or furnish information that are otherwise difficult or impossible to obtain. The system is i nteresting because of the phenomena of natural dialog that it attempts to model

tThis work was done by the language understander project at the Xerox Palo Alto Research center. Additional affiliations: D. A. Norman, University of California, San Diego; H. Thompso6, University of California, Berkeley; and T. Winograd, Stanford University.

Artificial Intelligence 8 0977), 155-173 Copyright © 1977 by North-Holland Publishing Company

Artificial Intelligence Journal, 1977

Page 6: Topics in Advance Dialog - Brandeiscs136a/CS136a_Slides/CS136a... · 2018-11-13 · Figure 25.1 Architecture of a dialog-state system for task-oriented dialog from Williams et al

+The GUS architecture

nA kind of mixed initiativenThe conversational initiative shifts

between system and user

nThe structure of the frame guides dialogue

Page 7: Topics in Advance Dialog - Brandeiscs136a/CS136a_Slides/CS136a... · 2018-11-13 · Figure 25.1 Architecture of a dialog-state system for task-oriented dialog from Williams et al

+ A travel dialog: CommunicatorXu and Rudnicky (2000)

S1: Hello. You’ve reached the [Communicator]. Tell me your full name.

U2: Hi, I’d like to fly to Seattle Tuesday morning

S3: Traveling to Seattle on Tuesday, August 11th in the morning. Your full name?

U4: John Doe

S5: Hello, John. I’m glad to hear from you again. OK. Let’s see, I have a United flight departing Pittsburgh at 10:45 am arriving into Seattle at 1:57 pm. Is that OK?

U6: Yes

S7: Will you return to Pittsburgh from Seattle?

U8: No, I will go on to Los Angeles

S9: Travelling to Los Angeles on what date

U10: Sorry, can we go back to the previous leg again

S11: For the first leg, you are on United flight 1446 departing Pittsburgh at 10:45 AM …

Page 8: Topics in Advance Dialog - Brandeiscs136a/CS136a_Slides/CS136a... · 2018-11-13 · Figure 25.1 Architecture of a dialog-state system for task-oriented dialog from Williams et al

+Dialog State Tracking Arch (REDO) 8

2 CHAPTER 25 • ADVANCED DIALOG SYSTEMS

the user said, and may need to ask clarification questions. The system needs a dialog

policy to decide what to say (when to answer the user’s questions, when to insteadask the user a clarification question, make a suggestion, and so on).

Figure 25.1 shows a typical architecture for a dialog-state system. It has sixcomponents. As with the GUS-style frame-based systems, the speech recognitionand understanding components extract meaning from the input, and the generationand TTS components map from meaning to speech. The parts that are different thanthe simple GUS system are the dialog state tracker which maintains the currentstate of the dialog (which include the user’s most recent dialog act, plus the entireset of slot-filler constraints the user has expressed so far) and the dialog policy,which decides what the system should do or say next.DIALOG STATE TRACKING OVERVIEW

LEAVING FROM DOWNTOWN

LEAVING AT ONE P M

ARRIVING AT ONE P M

0.6

0.2

0.1

{ from: downtown }

{ depart-time: 1300 }

{ arrive-time: 1300 }

0.5

0.3

0.1

from: CMUto: airportdepart-time: 1300confirmed: noscore: 0.10

from: CMUto: airportdepart-time: 1300confirmed: noscore: 0.15

from: downtownto: airportdepart-time: --confirmed: noscore: 0.65

Automatic Speech Recognition (ASR)

Spoken Language Understanding (SLU)

Dialog State Tracker (DST)

Dialog Policy

act: confirmfrom: downtown

FROM DOWNTOWN, IS THAT RIGHT?

Natural Language Generation (NLG)Text to Speech (TTS)

Figure 1: Principal components of a spoken dialog system.

The topic of this paper is the dialog state tracker (DST). The DST takes as input all of the dialoghistory so far, and outputs its estimate of the current dialog state – for example, in a restaurantinformation system, the dialog state might indicate the user’s preferred price range and cuisine,what information they are seeking such as the phone number of a restaurant, and which conceptshave been stated vs. confirmed. Dialog state tracking is difficult because ASR and SLU errors arecommon, and can cause the system to misunderstand the user. At the same time, state tracking iscrucial because the dialog policy relies on the estimated dialog state to choose actions – for example,which restaurants to suggest.

In the literature, numerous methods for dialog state tracking have been proposed. These arecovered in detail in Section 3; illustrative examples include hand-crafted rules (Larsson and Traum,2000; Bohus and Rudnicky, 2003), heuristic scores (Higashinaka et al., 2003), Bayesian networks(Paek and Horvitz, 2000; Williams and Young, 2007), and discriminative models (Bohus and Rud-nicky, 2006). Techniques have been fielded which scale to realistically sized dialog problems andoperate in real time (Young et al., 2010; Thomson and Young, 2010; Williams, 2010; Mehta et al.,2010). In end-to-end dialog systems, dialog state tracking has been shown to improve overall systemperformance (Young et al., 2010; Thomson and Young, 2010).

Despite this progress, direct comparisons between methods have not been possible because paststudies use different domains and different system components for ASR, SLU, dialog policy, etc.Moreover, there has not been a standard task or methodology for evaluating dialog state tracking.Together these issues have limited progress in this research area.

The Dialog State Tracking Challenge (DSTC) series has provided a first common testbed andevaluation suite for dialog state tracking. Three instances of the DSTC have been run over a three

5

Figure 25.1 Architecture of a dialog-state system for task-oriented dialog from Williams et al. (2016).

As of the time of this writing, no commercial system uses a full dialog-state ar-chitecture, but some aspects of this architecture are beginning to appear in industrialsystems, and there are a wide variety of these systems in research labs.

25.1 Dialog Acts

A key insight into conversation—due originally to the philosopher Wittgenstein(1953) but worked out more fully by Austin (1962)—is that each utterance in adialog is a kind of action being performed by the speaker. These actions are com-monly called speech acts; here’s one taxonomy consisting of 4 major classes (Bachspeech acts

and Harnish, 1979):

Page 9: Topics in Advance Dialog - Brandeiscs136a/CS136a_Slides/CS136a... · 2018-11-13 · Figure 25.1 Architecture of a dialog-state system for task-oriented dialog from Williams et al

+Speech Actsn Austin (1962): An utterance is a kind of action

n Clear case: performativesn I name this ship the Titanicn I second that motionn I bet you five dollars it will snow tomorrow

n Performative verbs (name, second)

n Austin’s idea: not just these verbs

Page 10: Topics in Advance Dialog - Brandeiscs136a/CS136a_Slides/CS136a... · 2018-11-13 · Figure 25.1 Architecture of a dialog-state system for task-oriented dialog from Williams et al

+Each utterance is 3 actsn Locutionary act: the utterance of a sentence with a

particular meaning

n Illocutionary act: the act of asking, answering, promising, etc., in uttering a sentence.

n Perlocutionary act: the (often intentional) production of certain effects upon the thoughts, feelings, or actions of addressee in uttering a sentence.

Page 11: Topics in Advance Dialog - Brandeiscs136a/CS136a_Slides/CS136a... · 2018-11-13 · Figure 25.1 Architecture of a dialog-state system for task-oriented dialog from Williams et al

+Syntax ≠ Intention

Locutionary

Force

Illocutionary

Force

Perlocutionary

Force

Can I have the rest

of your sandwich?

Or

Are you going to

finish that?

Question Request Effect: You give me

sandwich (or you are

amused by my quoting

from “Diner”) (or etc)

I want the rest of

your sandwich

Declarative Request Effect: as above

Give me your

sandwich!

Imperative Request Effect: as above.

Page 12: Topics in Advance Dialog - Brandeiscs136a/CS136a_Slides/CS136a... · 2018-11-13 · Figure 25.1 Architecture of a dialog-state system for task-oriented dialog from Williams et al

+ 5 classes of speech acts: Searle (1975)n Assertives: committing the speaker to something’s being the case

n (suggesting, putting forward, swearing, boasting, concluding)

n Directives: attempts by the speaker to get the addressee to do something

n (asking, ordering, requesting, inviting, advising, begging)

n Commissives: Committing the speaker to some future course of action

n (promising, planning, vowing, betting, opposing).

n Expressives: expressing the psychological state of the speaker about a state of affairs

n (thanking, apologizing, welcoming, deploring).

n Declarations: bringing about a different state of the world via the utterance

n (I resign; You’re fired)

Page 13: Topics in Advance Dialog - Brandeiscs136a/CS136a_Slides/CS136a... · 2018-11-13 · Figure 25.1 Architecture of a dialog-state system for task-oriented dialog from Williams et al

+Groundingn Why do elevator buttons light up?

n Clark (1996) (after Norman 1988)n Principle of closure. Agents performing an action

require evidence, sufficient for current purposes, that they have succeeded in performing it

n What is the linguistic correlate of this?

Page 14: Topics in Advance Dialog - Brandeiscs136a/CS136a_Slides/CS136a... · 2018-11-13 · Figure 25.1 Architecture of a dialog-state system for task-oriented dialog from Williams et al

+Common Groundn Dialog is a collective act performed by the speaker

and the hearer. n the hearer must ground the speaker’s utterances

n to acknowledge, to make it clear that the hearer has understood the speaker’s meaning and intention

n When the speaker has not succeeded, the hearer needs to indicate that to the speaker

Page 15: Topics in Advance Dialog - Brandeiscs136a/CS136a_Slides/CS136a... · 2018-11-13 · Figure 25.1 Architecture of a dialog-state system for task-oriented dialog from Williams et al

+Clark & Schaefer (1989) = 15

Continuum of methods used for grounding

Continued attention

B shows she is continuing to attend and therefore remains satisfied with A’s presentation (e.g. backchannel)

Next contribution

B starts in on the next relevant contribution

Acknowledgment

B nods or says a continuer like uh-huh, yeah, or the like, or an assessment like that’s great

Demonstration B demonstrates all or part of what she has understood A to mean, for example, by reformulating (paraphrasing) A’s utterance or by collaborative completion of A’s utterance

Display B displays verbatim all or part of A’s presentation

Page 16: Topics in Advance Dialog - Brandeiscs136a/CS136a_Slides/CS136a... · 2018-11-13 · Figure 25.1 Architecture of a dialog-state system for task-oriented dialog from Williams et al

+A human-human conversation

n C: …I need to travel in May

n A: And what day in May did you want to travel

n C: OK, uh, I need to be there for a meeting that’s from the 12th to the 15th

n A: And your flying into what city?

n C: Seattle

n A: And what time would you like to leave Pittsburgh?

n C: Uh hmm, I don’t think there’s many options for a nonstop.

n A: Right, there’s only three non-stops todayn C: What are they?

n ....

Page 17: Topics in Advance Dialog - Brandeiscs136a/CS136a_Slides/CS136a... · 2018-11-13 · Figure 25.1 Architecture of a dialog-state system for task-oriented dialog from Williams et al

+Backchanneln Compare no backchannel

System: Did you want to review some more of your personal profile? Caller: No. System: What’s next?

n With backchannelSystem: Did you want to review some more of your personal profile? Caller: No. System: Okay, what’s next?

17

Page 18: Topics in Advance Dialog - Brandeiscs136a/CS136a_Slides/CS136a... · 2018-11-13 · Figure 25.1 Architecture of a dialog-state system for task-oriented dialog from Williams et al

+Speech Act + Grounding = Dialog Actn Tag which represents the interactive function of the

sentence being tagged

n Set tends to be task specificn Abstraction over the set of “intents”n Plus grounding and generic conversational moves

n E.g. hello, goodbye, backchannel, accept, deny, clarify

18

Page 19: Topics in Advance Dialog - Brandeiscs136a/CS136a_Slides/CS136a... · 2018-11-13 · Figure 25.1 Architecture of a dialog-state system for task-oriented dialog from Williams et al

+Verbmobil Dialogue Actsn THANK Thanksn GREET Hello Dann INTRODUCE It’s me againn BYE All right, byen REQUEST-COMMENT How does that look?n SUGGEST June 13th through 17thn REJECT No, Friday I’m booked all dayn ACCEPT Saturday sounds finen REQUEST-SUGGEST What is a good day of the week for you?n INITIATE I wanted to make an appointment with youn GIVE_REASON Because I have meetings all afternoonn FEEDBACK Okayn DELIBERATE Let me check my calendar heren CONFIRM Okay, that would be wonderfuln CLARIFY Okay, do you mean Tuesday the 23rd?

Slides from Dan Jurafsky and Paul Martin

Page 20: Topics in Advance Dialog - Brandeiscs136a/CS136a_Slides/CS136a... · 2018-11-13 · Figure 25.1 Architecture of a dialog-state system for task-oriented dialog from Williams et al

+HIS restaurant recommendation system, Young et al. (2010)

Utterance Dialog act Hi, I am looking for somewhere to eat. hello(task = find,type=restaurant)

You are looking for a restaurant. What type of food do you like?

confreq(type = restaurant, food)

I’d like an Italian somewhere near the museum. inform(food = Italian, near=museum)

Roma is a nice Italian restaurant near the museum. inform(name = "Roma", type = restaurant, food = Italian, near = museum)

Is it reasonably priced? confirm(pricerange = moderate)

Yes, Roma is in the moderate price range.

affirm(name = "Roma", pricerange = moderate)

What is the phone number? request(phone)

20

Hello, inform, request, reqalt, confirm, select, affirm, negate, deny, bye

Page 21: Topics in Advance Dialog - Brandeiscs136a/CS136a_Slides/CS136a... · 2018-11-13 · Figure 25.1 Architecture of a dialog-state system for task-oriented dialog from Williams et al

+ Conversational analysis (Sacks et al., 1974)

n Adjacency pairsn Question/answern Greeting/greetingn Compliment/downplayern Request/grant

n Side Sequence, e.g. clarification subdialogUser: What do you have going to UNKNOWN WORD on the 5th? System: Let’s see, going where on the 5th?User: Going to Hong Kong. OK, here are some flights...

n PresequenceUser: Can you make train reservations?System: Yes I can.User: Great, I’d like to reserve a seat on the 4pm train to New York

21

Page 22: Topics in Advance Dialog - Brandeiscs136a/CS136a_Slides/CS136a... · 2018-11-13 · Figure 25.1 Architecture of a dialog-state system for task-oriented dialog from Williams et al

+Dialog State: Interpreting Dialog Acts n How to determine which dialog act, e.g question, statement

n Syntax

n But: Question à RequestQ: Can you give me a list of the flights from Atlanta to Boston?A: Yes

22

YES-NO QUESTION Will breakfast be served on USAir 1557?STATEMENT I don’t care about lunch.COMMAND Show me flights from Milwaukee to Orlando.

Page 23: Topics in Advance Dialog - Brandeiscs136a/CS136a_Slides/CS136a... · 2018-11-13 · Figure 25.1 Architecture of a dialog-state system for task-oriented dialog from Williams et al

+Additional Dialog Actsn Task oriented dialogs have dialog acts such as “Hold”

and “Check”

n Again syntax doesn’t helpn Check syntax is a statementn Presumably prosody indicated question, but we don’t have that.

23

OPEN-OPTION I was wanting to make some arrangements for a trip that I’m going to be taking uh to LA uh beginning of the week after next.

HOLD OK uh let me pull up your profile and I’ll be right with you here. [pause]

CHECK And you said you wanted to travel next week?

ACCEPT Uh yes.

Page 24: Topics in Advance Dialog - Brandeiscs136a/CS136a_Slides/CS136a... · 2018-11-13 · Figure 25.1 Architecture of a dialog-state system for task-oriented dialog from Williams et al

+Detecting correction acts n System needs to detect when a user is correcting some misunderstanding.

n Corrections can be harder for speech recognizers due to hyperarticulationn Usually exact or partial repetitions or sometimes paraphrases

n Features for detecting corrections

24

Lexical features words like “no”, “correction”, “I don’t”, or even swear words, utterance length

Semantic features overlap between the candidate correction act and the user’s prior utterance (computed by word overlap or via cosines over embedding vectors)

Phonetic features phonetic overlap between the candidate correction act and the user’s prior utterance (i.e. “WhatsApp” may be incorrectly recognized as “What’s up”)

Prosodic features hyperarticulation, increases in F0 range, pause duration, and word duration, generally normalized by the values for previous sentences

ASR features ASR confidence, language model probability

Page 25: Topics in Advance Dialog - Brandeiscs136a/CS136a_Slides/CS136a... · 2018-11-13 · Figure 25.1 Architecture of a dialog-state system for task-oriented dialog from Williams et al

+Dialog Policyn Confirmation

n Explicit confirmationS: Which city do you want to leave from?U: Baltimore.S: Do you want to leave from Baltimore?U: Yes.

n Implicit confirmationU: I want to travel to BerlinS: When do you want to travel to Berlin?

n Rejection n ”Sorry I didn’t understand that”n How many times to you ask?

n Progressive Prompting

25

Page 26: Topics in Advance Dialog - Brandeiscs136a/CS136a_Slides/CS136a... · 2018-11-13 · Figure 25.1 Architecture of a dialog-state system for task-oriented dialog from Williams et al

+Policy & Confidencen Use thresholds

26

<α low reject≥ α above threshold confirm explicitly ≥ β high confidence confirm implicitly ≥ γ very high confidence Don’t confirm at all

Page 27: Topics in Advance Dialog - Brandeiscs136a/CS136a_Slides/CS136a... · 2018-11-13 · Figure 25.1 Architecture of a dialog-state system for task-oriented dialog from Williams et al

+Natural Language Generationn Content Planning: What to sayn Sentence Realization: How to say itn Reality: Fill in a frame

27

Page 28: Topics in Advance Dialog - Brandeiscs136a/CS136a_Slides/CS136a... · 2018-11-13 · Figure 25.1 Architecture of a dialog-state system for task-oriented dialog from Williams et al

Ethical Issues in Dialog System Designn Machine learning systems replicate biases that

occurred in the training data.

n Microsoft's Tay chatbotn Went live on Twitter in 2016n Taken offline 16 hours later

n In that time it had started posting racial slurs, conspiracy theories, and personal attacksn Learned from user interactions (Neff and Nagy 2016)

Page 29: Topics in Advance Dialog - Brandeiscs136a/CS136a_Slides/CS136a... · 2018-11-13 · Figure 25.1 Architecture of a dialog-state system for task-oriented dialog from Williams et al

Ethical Issues in Dialog System Designn Machine learning systems replicate biases that

occurred in the training data.

n Dialog datasetsn Henderson et al. (2017) examined standard datasets (Twitter,

Reddit, movie dialogs)n Found examples of hate speech, offensive language, and bias

n Both in the original training data, and in the output of chatbots trained on the data.

Page 30: Topics in Advance Dialog - Brandeiscs136a/CS136a_Slides/CS136a... · 2018-11-13 · Figure 25.1 Architecture of a dialog-state system for task-oriented dialog from Williams et al

Ethical Issues in Dialog System Design: Privacyn Remember this was noticed in the days of Weizenbaum

n Agents may record sensitive datan (e.g. “Computer, turn on the lights [answers the phone –Hi, yes, my

password is...”],

n Which may then be used to train a seq2seq conversational model.

n Henderson et al (2017) showed they could recover such information by giving a seq2seq model keyphrases (e.g., "password is")

Page 31: Topics in Advance Dialog - Brandeiscs136a/CS136a_Slides/CS136a... · 2018-11-13 · Figure 25.1 Architecture of a dialog-state system for task-oriented dialog from Williams et al

Ethical Issues in Dialog System Design: Gender equalityn Dialog agents overwhelmingly

given female names, perpetuating female servant stereotype(Paolino, 2017).

n Responses from commercial dialog agents when users use sexually harassing language(Fessler 2017):

Page 32: Topics in Advance Dialog - Brandeiscs136a/CS136a_Slides/CS136a... · 2018-11-13 · Figure 25.1 Architecture of a dialog-state system for task-oriented dialog from Williams et al

Summaryn State of the art:

n Chatbots: n Simple rule-based systemsn IR or Neural networks: mine datasets of conversations.

n Frame-based systems: n hand-written rules for slot fillersn ML classifiers to fill slots

n What’s the future?n Key direction: Integrating goal-based and chatbot-

based systems