30
11/26/2008 1 Human-Computer Interaction Human Computer Interaction CSG 170 – Round 10 Prof. Timothy Bickmore Quiz

Human-Computer InteractionComputer Interaction CSG 170

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

11/26/2008

1

Human-Computer InteractionHuman Computer InteractionCSG 170 – Round 10

Prof. Timothy Bickmore

Quiz

11/26/2008

2

Quiz – backgroundDiscourse Analysis Notation One utterance per line One utterance per line. Brackets represent overlapped behavior

1. A: Did you really [mean that]?2. B: [yes]

Parentheses used for nonverbal behavior and other notations

1. A: (eyebrow raise) Really?2. B: (inaudible)

Ellipsis used for pauses ‘…’

Quiz Characterize the following 15 seconds of Characterize the following 15 seconds of

dialog Diane (on left) is a real estate agent, Beth & Ned

Flood are new clients. Video will be played 10 times You will then have 5 minutes to analyze & write up

your resultsyour results

Hint: think about each communicative modality separately, starting with speech

11/26/2008

3

System Prototyping /System Prototyping / Heuristic Evaluation

Quick Status Reports

11/26/2008

4

Human Dialogue

Hierarchical Structure of Language

Analyses StructuresAnalyses Structures

Phonological

Morphological

Lexical

Words

Morphemes

Phonemes

SoundsPhonological

Morphological

Lexical

Words

Morphemes

Phonemes

Sounds

Syntactic

Semantic

PragmaticCommunicative Intentions

Representation Structures

Syntactic Structures

Syntactic

Semantic

PragmaticCommunicative Intentions

Representation Structures

Syntactic Structures

11/26/2008

5

Discourse, Dialogue & NVB are all pragmatic concerns

Intimately depend on context Intimately depend on context Discourse is concerned with

The combinatorial meaning of utterances (internal context)

Other contextual phenomena (external context)context) Deixis Social deixis Grounding & referring Etc.

Conversational behavior vs. function

Eyebrow raise is a conversational behavior Eyebrow raise is a conversational behavior Emphasis is a conversational function There is a many-to-many mapping

between behaviors and functions

Eyebrow raise

BEAT gesture

Intonation pitch peak

Emphasis

Affective display

11/26/2008

6

Types of Conversational Function

Propositional Propositional Interactional Affective Attitudinal

R l ti l Relational …

Proxemics

Engagement & disengagement Engagement & disengagement Social distance Immediacy behavior

11/26/2008

7

Eye gaze

Attention Attention Deictic Turn-taking

Eyebrows

Emphasis Emphasis Affective displays

11/26/2008

8

Headnods

Emphasis Emphasis Greetings Backchannels Acknowledgements

Hand gesture

Form (from McNeil) Form… (from McNeil) Beat Deictic Iconic Metaphoric

Function Function Emphasis Propositional/Semantic Turn-taking / interruption

11/26/2008

9

Turn-taking

Interlocutors cannot talk at once Interlocutors cannot talk at once Cues for ‘giving turn’

Gaze at next speaker Pause Rising end intonation

Cues for taking ‘taking turn’ Speaking Gesturing

REA: Embodied Conversational Agent

11/26/2008

10

Grounding

Process by which interlocutors come to a Process by which interlocutors come to a shared understanding of what is said

A collaborative process Mechanisms

Requests for acknowledgementA k l d t Acknowledgements Can be contingent move

Request for repair Repair

Discourse understanding How does hearer understand?

Assumptions• Physical context

Communicative behavior

DiscourseContext

• Physical context• Language abilities• “Commonsense” • Cultural context• Gricean maxims(quantity, quality,relevance, etc)

• etcSpecifications for updatingdiscourse context given assumptions

Speaker Hearer

11/26/2008

11

Speech act theory

Our utterance are actions Our utterance are actions Bless, Vow, Curse Assert, Promise, Request, Reply, etc.

Locutionary act (uttering the words) Illocutionary act (speech act)

l i ( l i ff ) Perlocutionary act (ultimate effect)

e.g. “It’s cold in here.”

Other phenomena

Anaphora / Cataphora Anaphora / Cataphora John ate an apple. He liked it. If you need one, there’s a towel in the top

drawer. Ellipsisp

Anything wrong?

11/26/2008

12

Representations of Dialog as used in dialog systems

State transition networks State transition networks Hierarchical ATNs

Rule-based Chunks of dialog are represented by a

partially ordered sequence of statespartially-ordered sequence of states e.g. adjacency pairs

Chunks are negotiated and deployed dynamically at runtime

Grosz & Sidner ’86Attention, Intention and the Structure of Discourse

• Discourse can be partitioned into discourse segments • The segments have a hierarchical structure• This structure is dynamically determined • Interlocutors must maintain a stackmaintain a stack representing where they are in the hierarchy• This stack is used to efficiently generate and resolve references

11/26/2008

13

Example Grosz & Sidner / COLLAGEN

EvaluateExerciseEvaluateExercise ShowGraphShowGraph

Interacting about exercise.Done opening interaction.1

Interacting about exercise.Done opening interaction.1

Interacting about exercise.Done opening interaction.11Intentional Structure

Opening ClosingDiscussPreviousDay DiscussNextDay

ShowGraph DiscussGraph

1

2

3

Intentional Structure

Opening ClosingDiscussPreviousDay DiscussNextDay

ShowGraph DiscussGraph

11

22

33

DiscussPreviousDay

EvaluateExercise

Attentional State

DiscussPreviousDay

EvaluateExercise

Attentional StateAsk Steps

p gDiscussing previous day.

Done client identifying steps walked yesterday as 1000.Coach says “How many steps did you walk yesterday?”Client says “1000.”

Coach showing graph for the week.Coach says “Here’s how you’ve done this week.”Next expecting coach to show graph for the week.

Expecting to discuss graph.Expecting to discuss next day.Expecting to close interaction.

2

3

Linguistic Structure

p gDiscussing previous day.

Done client identifying steps walked yesterday as 1000.Coach says “How many steps did you walk yesterday?”Client says “1000.”

Coach showing graph for the week.Coach says “Here’s how you’ve done this week.”Next expecting coach to show graph for the week.

Expecting to discuss graph.Expecting to discuss next day.Expecting to close interaction.

2

3

p gDiscussing previous day.

Done client identifying steps walked yesterday as 1000.Coach says “How many steps did you walk yesterday?”Client says “1000.”

Coach showing graph for the week.Coach says “Here’s how you’ve done this week.”Next expecting coach to show graph for the week.

Expecting to discuss graph.Expecting to discuss next day.Expecting to close interaction.

22

33

Linguistic Structure

State of the art – InteractiveState of the art Interactive Voice Response (IVR) systems

11/26/2008

14

Current status of IVR systems Speech Recognition: accuracy of 85-90%+ Speech Recognition: accuracy of 85 90%+ Speech Synthesis: high quality for short

utterances Dialog Management: strongly-managed

dialog flow Language Processing: Very limited g g g y Language Generation: Very trivial

Current IVR Apps

Directory assistance Directory assistance Taxi bookings Stock transactions Remote banking Travel reservations Auto-attendants Pizza ordering And…

11/26/2008

15

State of the Art:Scripted Interactions

Scripts written by Scripts written by teams of experts

Represented as flow charts or Augmented transition networks

GetCommitment

GC_6

I'm going to workout at the gym.

Great. How much aerobic exercise do you plan to do?

GC_4

X again?

No

How long do you plan to play for?

Yes

Great. How long do you plan to go for?

GC_7

I'm going to go for a walk.

Great. How long do you plan to go for?

GC_8(null)

(below exp)

(at exp)

Yes

Do you think you can go for X minutes?Do you think you can increase your time a little today/tomorrow?Can you keep up the same time as yesterday/etc?

if REL & know locationWhere are you going to walk? MotivateDuration

No

GC_1(null)

I'm going to play a sport.

if REL & know sport

GC_2

GC_3

GC_5

GC_9

What kind of exercise are you going to do?

Are you going to workout tomorrow?

Yep

GC_START(null)

Are you going to get some [more] exercise today?

(time2bed<2)

(time2bed > 2)

noyes

GC_16

GC_17

GC_18

Something else.

(TEXTENTRY)What kind of exercise?

GC_19

Great.

Which one?

I can't

GC_21

GC_22

(above exp)

No, I really want to.OK

You shouldn't try to do somuch so soon... How about

X minutes this time?

OK, but you should tryto increase gradually..transition networks

Implemented as state machines / ATNs / VXML

GC_END

GC_12(null)

GC_10No

yesAre you going to (location X) again?

Who?

(if REL & know buddy)

(else)

No

Yes

yes

No

(loner)

MotivateToExercise

GC_11

GC_13

GC_14

GC_15

Are you going to gowith X again?

Are you going togowith anyone?

Is it becuaseof your illness/injury?

(no illness/injury)

(illness|injury)

Yes

No

GC_19

GC_20

11/26/2008

16

Commercial & Research Tools Available

IntervoiceInVision Studio

Also:

Telera DeVXchangeAppbuilder

AudiumAudium3

VoiceGenieGenieBuilder

11/26/2008

17

Language for specifying voice dialogs

What is it? Language for specifying voice dialogs Output:

Prerecorded audio and text-to-speech (TTS)

Input: Touch-tone keys and Automatic Speech y p

Recognition (ASR)

Extension of XML Designed to interact with web-based

applications

1995

History 1995

Phone Web project by AT&T Research 1999

Lucent and AT&T have incompatible dialects of Phone Markup Language

So, VoiceXML Forum created with AT&T, Lucent, Motorola, and IBM

Team develops VoiceXML 0.9, a first pass at standardization 2000

VoiceXML 1.0 was created and submitted to World-Wide Web Consortium (W3C)

2001 VoiceXML 2.0 by W3C’s Voice Browser Working Group

11/26/2008

18

Big Picture:The phone as a web browser

VXML is a kind of XML Tags and bodyTags and body

<cmu><welcome>Welcome to CMU! </welcome> <ecom>

<welcome>Welcome to the E-commerce!</welcome></ecom>

</cmu>

zero or more attributesl t “t ” W l / l<welcome accent=“texan”>Welcome</welcome>

<welcome accent=“pittsburgh”>Welcome</welcome>

Tag with no body<breath/>

11/26/2008

19

A simple example<?xml version="1 0" encoding="UTF-8"?><?xml version 1.0 encoding UTF 8 ?><vxml version="2.1"><form>

<field name=“goingtoBeach" type="boolean"><prompt>

“Are you going to Daytona Beach this year?”</prompt>

<filled>Ohh... <if cond=“goingtoBeach">That’s great!

<goto next=“goingDocument.vxml" />

< l /><else />

bummer maybe next year.<goto next=“notgoingDocument.vxml" />

</if></filled>

</field></form>

</vxml>

<?xml version="1.0" encoding="UTF-8"?>

A simple exampleASR grammar & TTS prompt

<vxml version="2.1"><form>

<field name=“goingtoBeach" type="boolean"><prompt>

“Are you going to Daytona Beach this year?”</prompt>

<filled>Ohh... <if cond=“goingtoBeach">That’s great!

<goto next=“goingDocument.vxml" />

<else /><else />

bummer maybe next year.<goto next=“notgoingDocument.vxml" />

</if></filled>

</field></form>

</vxml>

11/26/2008

20

? l i "1 0" di "UTF 8"?

A simple exampleExecution conditioned on ASR recognition

<?xml version="1.0" encoding="UTF-8"?><vxml version="2.1"><form>

<field name=“goingtoBeach" type="boolean"><prompt>

“Are you going to Daytona Beach this year?”</prompt>

<filled>Ohh... <if cond=“goingtoBeach">That’s great!

<goto next=“goingDocument.vxml" />

<else />

bummer maybe next year.<goto next=“notgoingDocument.vxml" />

</if></filled>

</field></form>

</vxml>

<?xml version="1 0" encoding="UTF-8"?>

A simple exampleif/else with goto

<?xml version 1.0 encoding UTF 8 ?><vxml version="2.1"><form>

<field name=“goingtoBeach" type="boolean"><prompt>

“Are you going to Daytona Beach this year?”</prompt>

<filled>Ohh... <if cond=“goingtoBeach">That’s great!

<goto next=“goingDocument.vxml" />

< l /><else />

bummer maybe next year.<goto next=“notgoingDocument.vxml" />

</if></filled>

</field></form>

</vxml>

11/26/2008

21

Custom Input Grammar<form>

<field name=“month”>

<prompt>…</prompt>

<grammar src=“month.grxml”/>

</form>

Many formats for grammars, both ASR and y gDTMF.

Recorded audio instead of TTS

<prompt>

Hello! Thanks for calling.

<audio src=“welcome.wav”/>

</prompt>

11/26/2008

22

Some error handling

<noinput>I did not hear anything. Please try again.<reprompt/>

</noinput>

<nomatch>I did not recognize that character. Please try again.<reprompt/><reprompt/>

</nomatch>

Evaluation of IVRs

PARADISE PARADISE Decision-theoretic framework to derive a

single metric that takes system accuracy, efficiency and subjective factors into account

Ch k li t Check lists TRINDI DISC

11/26/2008

23

“TRINDI Tick List”

(A subset)

Q1: Is Utterance Interpretation Sensitive to Context?

S: When do you want to arrive? S: When do you want to arrive? U: Tomorrow. S: You want to arrive on Thursday

October 31st.

Context = Time, Space, Person, etc.

11/26/2008

24

Q2: Is Over-answering Possible?

S: Where do you want to go to? S: Where do you want to go to? U: Boston at 3pm.

Q3: Can Answers be for Unasked Questions?

S: Where do you want to go? S: Where do you want to go? U: I'm leaving New York on the 29th

June.

11/26/2008

25

Q4: Can Answers be Underinformative?

S: On what date do you want to go? S: On what date do you want to go? U: Sometime next week. S: I need to know which day next week. U: Tuesday.

Q5: Can the User use Ambiguous Designators?

S: Where do you want to go? S: Where do you want to go? U: London. S: London Heathrow or London

Gatwick? U: Heathrow U: Heathrow.

11/26/2008

26

Q6: Can Answers Provide Negative Information?

S: When do you want to leave? S: When do you want to leave? U: Not before lunchtime.

Q7: Can Help be Asked For in Appropriate Ways?

S: What meal do you want? S: What meal do you want? U: What are the choices? S: Standard, vegetarian, Kosher and

Halal. U: Halal please U: Halal, please.

11/26/2008

27

Q8: Can the User Initiate Subdialogs?

S: Where do you want to go? S: Where do you want to go? U: Are there any economy flights to

Boston today? S: Yes. U: I want to go to Boston today and U: I want to go to Boston today and

New York on Thursday.

Q9: Can the System Reformulate an Utterance?

S: Do you want to fly cabin class? S: Do you want to fly cabin class? U: I don't understand. S: There are two classes of seats,

business and cabin; cabin is also known as economy class. Do you want to flyas economy class. Do you want to fly cabin class?

11/26/2008

28

Additional questions Q10 – Can the system deal with inconsistent Q10 Can the system deal with inconsistent

information? Q11 – Can the system deal with belief revision? Q12 – Can the system deal with no answer to a

question at all? Q13 – Can the system repeat an utterance on

request?q Q14 – Does the system make it explicitly clear that it

is not a human? Q15 – Can the system keep track of multiple entities

(e.g., routes) at the same time?

Homework I7 – Part I

Evaluate a commercial IVR system Evaluate a commercial IVR system Call 1-800-FANDANGO Characterize the dialog for checking the times of

a particular movie at a particular theatre as a hierarchical STN. Start with the top level and work down. No more than 30 states.

Evaluate the TRINDI tick list items (yes or no) presented in class.

EMAIL your results.

11/26/2008

29

Homework I7 – Part II

Write Voice XML Write Voice XML Get the Voxeo 'helloworld' VXML tutorial

application working. This involves signing up for a Voxeo developer account.

NOTE: host your vxml pages on your own web server (not Voxeo's).

Write a new VXML application to tell 'knock Write a new VXML application to tell knock knock' jokes

As many as the user wants to hear - repeating after 3 is OK.

To do

Read Read Dix Ch 19 - CSCW 3 CSCW papers

Homework I8: call 1-800-FANDANGO & VXML T7 - software prototype

Address feedback from heuristic evaluations assign each a severity rating (cosmetic, minor, major, catastrophic)g y g ( , , j , p ) brainstorm possible solutions Modify your system to correct as many of the problems found as

possible (in priority order) document how you do this.

Get started on final report CHI format

11/26/2008

30

Papers: Speech Interfaces

Similarity is More Important than Expertise: Similarity is More Important than Expertise: Accent Effects in Speech Interfaces - CHI'07 [Tracy]

Dealing with System Response Times in Interactive Speech Applications CHI'05 [Bhavna][Bhavna]

Shaping User Input in Speech Graffiti: a First Pass – CHI’06 [Arpit]