25
1 Dialog systems Rolf Carlson, CTT, KTH Multi Disciplinary Recognition Synthesis Understanding Generation Dialog Control Production Perception Linguistics Knowledge Human- Machine Programming Phonetics Statistics Signal Processing Semantics Physiology Agenda Dialog Systems Data Collection Recognition Understanding • Disfluency Generation, Vocabulary Dialog models Spontaneous Data • Platforms • Evaluation Error Handling • Challenges Classic systems Research systems – Voyager (1989) – ATIS (1992) – SUNDIAL (1993) – TRAINS (1996) • Application – Philips Train Information (1995) Large Efforts – Communicator – Verbmobil Nordic Scene Stockholm, Sweden – Waxholm Linköping, Sweden – LINLIN Göteborg, Sweden – TRINDI Aalborg, Denmark Helsinki, Finland Trondheim, Norway Dialog systems at KTH

Dialog systems at KTH

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Dialog systems at KTH

1

Dialog

syst

ems

Rolf C

arlson

, CT

T, KT

H

Multi D

isci

plin

ary

Reco

gnit

ion

Synt

hesi

s

Und

erst

andi

ngGe

nera

tion

Dia

log

Cont

rolPr

oduc

tion

Perc

epti

on

Ling

uist

ics

Know

ledg

e

Hum

an-

Mac

hine

Prog

ram

min

g

Phon

etic

s

Stat

isti

cs

Sign

al P

roce

ssin

g

Sem

anti

csPhys

iolo

gy

Agen

da

•D

ialo

g S

yste

ms

•D

ata

Colle

ctio

n•

Rec

ognitio

n U

nder

stan

din

g•

Dis

fluen

cy•

Gen

erat

ion,

Voca

bula

ry•

Dia

log m

odel

s•

Sponta

neo

us

Dat

a•

Pla

tform

s•

Eva

luat

ion

•Err

or

Han

dlin

g•

Challe

nges

Cla

ssic

sys

tem

s

•Res

earc

h s

yste

ms

–Voy

ager

(1989)

–ATIS

(1992)

–SU

ND

IAL

(1993)

–TRAIN

S (

1996)

•Applic

atio

n–

Phili

ps

Tra

in I

nfo

rmation

(1995)

•La

rge

Eff

ort

s–

Com

munic

ator

–Ver

bm

obil

Nord

ic S

cene

•Sto

ckholm

, Sw

eden

–W

axhol

m

•Li

nkö

pin

g,

Sw

eden

–LI

NLI

N

•G

öte

borg

, Sw

eden

–TRIN

DI

•Aal

borg

, D

enm

ark

•H

elsi

nki, F

inla

nd

•Tro

ndhei

m,

Norw

ay

Dia

log

syst

em

s at

KTH

Page 2: Dialog systems at KTH

2

Dia

log

syst

em

s at

KTH

Dia

log

syst

em

s at

KTH

Dia

log

syst

em

s at

KTH

The

HIG

GIN

S d

omai

n

AdApt

multim

odal

dia

log s

yste

m

Mul

tim

odal

dia

log

syst

em

Conv

ersa

tion

sab

out

appa

rtm

ents

for

sale

Wor

kto

geth

erw

ith

a an

imat

edag

ent,

Urb

an

Som

ere

searc

h iss

ues

•M

ultim

odal dia

log m

odel

ling

–Spee

chSyn

thes

is,

Anim

atio

n,

Turn

taki

ng

–Spee

chRec

ognitio

n,

Poin

ting

–Ref

eren

ceH

andlin

g

•Err

or

Han

dlin

g•

Adap

tivity

Dia

log P

hen

om

ena

•“H

ar

du

in

get

bil

lig

are

?”

Implic

it r

efer

ence

, el

lipse

, co

nte

xt

•“B

erä

tta m

er

om

den

an

dra

läg

en

hete

n!”

Met

a-r

efer

ence

•“V

ad

men

ar

du

med

ch

arm

ig?”

Dom

ain-q

ues

tion

Page 3: Dialog systems at KTH

3

Simulation(W

izard-of-Oz)

User

Hum

anoperator

Wiz

ard

of O

z

•H

ow m

uch

doe

sth

e w

izar

d,

WO

Z,

take

care

of•

The

Com

ple

teSys

tem

•Pa

rts

of t

he

syst

em–

Rec

ognitio

n–

Syn

thes

is–

Dia

log H

andlin

g–

Know

ledge

Bas

e

•W

hic

hdem

ands

on t

he

WO

Z–

How

to h

andl

eer

rors

–Shou

ldyo

u a

dd

info

rmat

ion

–W

hat

is a

llow

edto

say

•W

hic

hsu

pport

doe

sth

e W

OZ h

ave

Wiz

ard

-of-

Oz

data

colle

ctio

n

The

Wiz

ard’

s gr

aphi

cal i

nter

face

Early

dem

oPic

torial sc

enarios

Adap

t –

dem

onst

ration

of ”c

omple

te”

syst

em

Page 4: Dialog systems at KTH

4

Th

e W

axh

olm

Pro

ject

•to

uris

t in

form

atio

n•

Stoc

khol

m a

rchi

pela

go•

tim

e-ta

bles

, hot

els,

hos

tels

, cam

ping

an

d di

ning

pos

sibi

litie

s.•

mix

ed in

itia

tive

dia

logu

e•

spee

ch r

ecog

niti

on•

mul

tim

odal

syn

thes

is•

grap

hic

info

rmat

ion

•pi

ctur

es, m

aps,

cha

rts

and

tim

e-ta

bles

.

Waxh

olm

Sys

tem

Gram

mat

ik &

sem

antik

Dialogk

ontr

oll

Graf

ik

Igen

känn

ing

Aku

stisk

och

visu

ell

Talsy

ntes

Dat

abas

-Sök

ning

Ljud

Kart

or o

ch t

abel

ler

Bått

idta

belle

r,H

amnp

osit

ione

r,H

otel

l,Re

stau

rang

er,

Mm

.

Inmat

ning

Utm

atning

Kont

extk

änsl

iga

Regl

er o

ch n

ätve

rkLe

xiko

n

Sim

uler

ing

med

W

izar

d

Insp

elni

ngar

Tal

Dat

abas

Ord

Ord

klas

ser

Sem

antis

k in

form

atio

nUt

tal

Tal

The

Wax

holm

inte

rfac

e

Waxh

olm

Data

base

•About

70 s

ubje

cts

(9200 w

ord

s)•

Phonet

ical

ly t

ransc

ribed

•Exa

mple

s fr

om

the

Wax

holm

sys

tem

–Fi

ve d

iffe

rent

spea

kers

•EJ

KR

G

Ö

LN

M

K

Wor

d c

over

age

00,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1,0

010

020

030

040

050

060

070

0Num

ber

of w

ords

CoverageLe

xico

n -

tran

scription

Word

freq.

Word

freq.

skärgården

69skulle

33SJ'Ä3R

GÅ:2N

15SKk”ULE0

26

SJ'Ä3R

GgÅ

:2N

6SKk”U

3SJ'Ä3G

gÅ:2N

5SKk”UL

2SJӀ3R

#GgÅ̀:2DdE0N

5SKkLE0

1SJ'Ä3R

GgÅ

:2DdE0N

4SKk”UE0

1SJ'Ä3R

GgÅ

:2DE0N

4SJӀ3R

#GgÅ̀:2DE0N

4SJ'Ä3G

gÅ:2DE02N

32S'Ä3R

GgÅ

:2N

3SJ'Ä3R

GgÅ

:N2

SJӀ3R

#GgÅ̀:2DdE02N

2SJӀ3R

#GÅ̀:2DE02N

2SJ”Ä3#GgÅ̀:2DdE0N

2SJ'Ä3R

Å:NG

1SJ'Ä3R

GÅ:N

1SJ'Ä3R

GÅ:2DdE02N

1SJ'Ä3R

GÅ:2DE0N

1SJ'Ä3R

GÅ:2DE02N

1SJ'Ä3G

gÅ:2DdE0N

v1

SJ'Ä3G

gÅ:2DE0N

1SJ'Ä3G

Å:2DE0N

1SJӀ3R

#GgÅ̀:RDdE0N

1

Page 5: Dialog systems at KTH

5

Utt

erance

len

gth

dis

trib

ution

0

50

100

150

200

250

300

350

05

10

15

20

Utterancelength(numberofwords)

NumberofoccurrencesExtr

a lin

gu

isti

c so

un

ds

32(.3%)

60(.6%)

67(.7%)

117(1.3%)

152(1.6%)

230(2.5%)

110

100

1000

interruptedword

exhalation

hesitation

inhalation

smack

insertedvowel

NumberofoccurrencesintheWaxholm

database

(9200words)

Thre

e ye

ars

late

r….

Th

e W

axh

olm

syst

em

Whe

reca

nIf

ind

hote

ls?

Info

rmat

ion

abou

thot

els

issh

own

inth

ista

ble

Whi

chho

tels

are

inW

axho

lm?

Info

rmat

ion

abou

tthe

hote

lsin

Wax

holm

issh

own

inth

ista

ble

Whe

reis

Wax

holm

?W

axho

lmis

show

non

this

map

Ithi

nkIw

antt

ogo

toW

axho

lmIa

mlo

okin

gfo

rboa

tsto

Wax

holm

From

whe

redo

you

wan

tto

goTh

eci

tyW

hich

day

ofth

ew

eek

doyo

uw

antt

ogo

?Iw

antt

ogo

tom

orro

wTh

ere

are

lots

ofbo

ats

from

Stoc

khol

mto

Wax

holm

ona

Frid

ay,A

twha

ttim

edo

you

wan

tto

go?

Whe

ndo

the

even

ing

boat

sde

part?

This

isa

tabl

eof

the

boat

s...

Isit

poss

ible

toea

tin

Wax

holm

?In

form

atio

nab

outt

here

stau

rant

sin

Wax

holm

issh

own

inth

ista

ble

Than

kyo

uTh

ank

you

too

Inst

ead

of W

OZ

”Boos

trap

” th

e sy

stem

•M

ake

a si

mple

but

com

ple

tesy

stem

and e

valu

ate

•Spre

adth

e in

form

atio

n…

.•

Colle

ctdat

a•

Upgra

de

the

syst

em

Phili

ps

trai

n info

rmat

ion

020

7575

75

Page 6: Dialog systems at KTH

6

Web

GALA

XY D

ispla

y

System

Status

Recognition

Paraphrase

Domain

Input

Status

Abort

Connection

Inform

ation

MIT-LCS

Tra

inin

g o

f Ju

piter

Eve

ngen

eric

data

bas

esare

im

port

ant

Sw

edia

Spee

chD

at

……

Sw

edis

h d

iale

cts

”Fly

get,

tåg

et o

ch b

ilbra

nsch

en t

ävla

r om

löns

amhe

t oc

h fo

lket

s gu

nst”

.

Född

iU

SA

ex-J

ugos

lavi

en

Spee

chunder

standin

gso

me

aspec

ts

•Big

ram

Tig

ht

couplin

g•

Key

wor

dsp

otting

•Ph

rase

spot

ting

•Fu

ll gra

mm

atic

alan

d s

eman

tic

anal

ysis

•O

OV o

ut

of v

ocab

ula

ry

Per

ple

xity

of th

e la

nguage

Bperp

lexi

tyfo

r th

e a

pplic

ation

Hentr

opy

for

the a

pplic

ation

P(W

)pro

bab

ility

of

a w

ord

giv

en its

pre

ceedin

gco

nte

xt

HPW

PW

W

=−

∀�(

)log

()

2

B=2H

Page 7: Dialog systems at KTH

7

paus|hej|paus

paus|hej|paus

paus|hej|paus

paus|hej|paus

Auto

mat

ic r

ecognitio

n

paus|hej|paus

dynamic

“analysis”

...a b

grammatical

analysis

semantic

analysis

dialog-

analysis

paus|nej|paus

Auto

mat

ic u

nder

stan

din

g

...a b

grammatical

analysis

dialog-

analysis

paus|nej|paus

dynamic

“analysis”

semantic

analysis

Rep

rese

nting m

ultip

le h

ypoth

eses

Jag

vill

åka

till

Vaxh

olm

Jag

till

åka

till

Vaxh

olm

Jag

vill

åka

vill

Vaxh

olm

Jag

till

åka

vill

Vaxh

olm

Ja v

ill å

ka t

ill V

axho

lmJa

till å

ka t

ill V

axho

lmJa

vill å

ka v

ill V

axho

lmJa

till å

ka v

ill V

axho

lm

N-b

est

list

Wor

d gr

aph

jag

vill

åka

till

Vaxh

olm

jatill

vill

Know

ledge

sourc

es -

Eva

luation

Aco

ust

ic a

naly

sis

Syn

tact

ic a

naly

sis

Sem

antic

analy

sis

Dia

log s

tate

Dia

log C

onte

xt

Conf

iden

ceEx

pect

atio

n Fi

lter

Syn

tact

ic a

nal

ysis

Multi-

leve

l an

alys

is

p(boat|“TOP+SUBJ”)

p(verb|“SUBJ+boat+VP”)

p(TO|“VP+verb+TO_PLACE”)

p(port|“VP+verb+TO_PLACE”)

p(SUBJ|“TOP”)

p(VP|“TOP+SUBJ”)

p(TO_PLACE|“TOP+SUBJ+VP”)

p(END|“SUBJ+VP+TO_PLACE”)

“båten”

boat

“Vaxholm”

“till”

verb

“går”

TO

port

SUBJ

VP

TO_PLACE

TOP

SCORE=Σp n

ode+Σp terminal+Σp w

ord+f(length)

Page 8: Dialog systems at KTH

8

I w

ant

to g

o..

....

12.26

JagvillåkafrånStockholm

tillVaxholm.

11.99

JagvillåkatillVaxholm

frånStockholm.

10.01

JagvillåkatillVaxholm.

9.85

JagskulleviljaåkatillVaxholm.

5.30

Jagvillåka.

3.17

NärgårdetenbåttillVaxholm?

-1.32

NärgårbåtentillVaxholm?

-1.95

Jagvillåkatillmam

ma.

Parserscore

IwanttogofromStockholm

toVaxholm.

IwanttogotoVaxholm

fromStockholm.

IwanttogotoVaxholm.

IwouldliketogotoVaxholm.

Iwanttogo.

WhendoesaboatgotoVaxholm?

WhendoestheboatgotoVaxholm?

Iwanttogotomymother.

Robust

Anal

ysis

•Rob

ust

inte

rpre

tation

–U

sing g

ram

mar

to

auto

mat

ical

ly d

etec

t non

-exp

ecte

d

wor

ds

bet

wee

n a

nd insi

de

phra

ses

–Pe

rfor

ms

bet

ter

than

key

wor

d-s

pot

ting f

or d

etec

ting

erro

neo

us

conte

nt-

wor

ds

–Ska

ntz

e, G

. &

Edlu

nd,

J. (

2004).

Rob

ust

inte

rpre

tation

in

the

Hig

gin

s sp

oken

dia

logue

syst

em.

Dis

fluen

cies

Type

Example

Filledpause

jaghumtyckeromglass

Repetition

Insertion

Restart

Substitution

jagjagtyckeromglass

kandujagtyckeromglass

jagtyckerominteomglass

vityckerjagtyckeromglass

‘Dis

fluen

cy r

ate

•H

um

an

-H

um

an

–Tw

o per

son t

elep

hon

eH

igh

–Tw

o per

son d

irec

t

–O

ne

per

son

•H

um

an

-M

ach

ine

–Com

pute

r in

tera

ctio

nLo

w

S.O

viatt

Dis

trib

ution

of D

isfluen

ices

Swit

ch b

oard

dat

a, L

iz S

hrib

erg,

The

sis,

SRI

DisfluencyPosition

0,00

0,10

0,20

0,30

0,40 0,00

0,20

0,40

0,60

0,80

1,00

PositionofCurrentWord

/TotalWords

ProportionofDisfluencies

5-8

words

9-12words

13-16words

Dis

fluen

cy e

xam

ple

s fr

om A

dapt

rätt

else

det

är li

te f

ör..

.lit

e fö

r se

nt tid

igt

finn

s de

t nå

n eh

m .

..lik

nand

e lg

h …

in/~

områ

det

med

som

är

bygg

d på

180

0tal

et

avbr

utet

hur se

/ eh

...

är k

öket

eh

…ut

rust

at

paus

eruh

m..

.hö

gtti

ll ta

k oc

heh

...

kans

ke n

ågon

kak

elug

n..

.oc

h ba

lkon

g gä

rnaiii

söde

rläg

e

felu

ttal

är d

en e

h ny

redo

~ny

reno

vera

d

förl

ängn

ing

huuu

rrr

ser

gata

n ut

rätt

else

term

jag

vill

gärn

aha

en

läge

nhet

med

...

utsi

kt..

. ne

jmed

bal

kong

Page 9: Dialog systems at KTH

9

Dis

fluen

cies

in h

alf o

f th

e Adap

t co

rpus

22%

of

all u

tter

ance

s di

sflu

ent

6% o

f al

l wor

ds d

isfl

uent

Perc

enta

ge d

isflue

nt w

ords

intu

rns

with

five

to n

ine

word

s

4812

Individu

al u

sers

[%]

4812

1-4

word

s5-

9 wo

rds

10-1

4 wo

rds>

=15

wor

ds

Perc

enta

ge d

isflue

nt w

ords

[%

]

Utt

eran

ce le

ngth

s

Utt

erance

Gen

erat

ion

•Pre

def

ined

utt

eran

ces

•Fr

ames

with s

lots

•G

ener

atio

n b

ased

on g

ram

mar

and

under

lyin

gse

man

tics

Sys

tem

Utt

erance

s

•The

outp

ut

should

ref

lect

the

syst

em’s

vo

cabula

ry a

nd lin

guis

tic

capab

ility

–th

e use

rs a

dap

t

•Short

utt

eran

ces

–The

use

rs a

dap

t

•G

ood e

rror

mes

sages

–U

se w

ords

and p

hra

ses

the

syst

em c

an h

andle

Use

r an

swers

to

qu

est

ion

s?

The

answ

ers

to t

he

ques

tion

:“W

hat

weekd

ay d

o y

ou

wan

t to

go?”

(Vilk

en v

ecko

dag

vill

du å

ka?)

•22%

Fri

day

(fre

dag

)•11%

I w

an

t to

go

on

Fri

day

(jag

vill

åka

fred

ag)

•11%

I w

an

t to

go

to

da

y(j

ag v

ill å

ka idag

)•

7%

o

n F

rid

ay

(på

fred

ag)

•6%

I

wan

t to

go

a F

rid

ay

(jag

vill

åka

en f

redag

)

•-

are

th

ere

an

y h

ote

ls i

n V

axh

olm

?(f

inns

det

någ

ra h

otel

l i Vax

hol

m)

Use

r an

swers

to

qu

est

ion

s?

The

answ

ers

to t

he

ques

tion

:“W

hat

weekd

ay d

o y

ou

wan

t to

go?”

(Vilk

en v

ecko

dag

vill

du å

ka?)

•22%

Fri

day

(fre

dag

)•11%

I w

an

t to

go

on

Fri

day

(jag

vill

åka

fred

ag)

•11%

I w

an

t to

go

to

da

y(j

ag v

ill å

kaid

ag)

•7%

o

n F

rid

ay

(på

fred

ag)

•6%

I

wan

t to

go

a F

rid

ay

(jag

vill

åka

en f

redag

)

•-

are

th

ere

an

y h

ote

ls i

n V

axh

olm

?(f

inns

det

någ

ra h

otel

l i Vax

hol

m)

Pairs

of a

lter

nat

ive

main

ver

bs

höra

-lys

sna

(list

en -

hear

)

sepå

-gå

(wat

ch-

go t

o)

före

drar

-tyc

ker

mes

t om

(pre

fer

-like

the

mos

t)

köpa

-han

dla

(sho

p-bu

y)

test

a-p

röva

(tes

t-tr

y)

vand

ra-s

tröv

a(h

ike-

stro

ll)

Page 10: Dialog systems at KTH

10

Huroftaåkerduutomlandspåsemestern?

jagåkerengångomåretkanske

jagåkerganska

sällanutomlandspåsemester

jagåkernästanalltidutomlandsunderminsemester

jagåkerungefär2gångerperårutomlandspåsemester

jagåkerutomlandsnästanvarjeår

jagåkerutomlandspåsemestern

varjeår

jagåkerutomlandsungefärengångomåret

jagärnästanaldrigutomlands

enellertvågångeromåret

engångpersemester

kanske

engångperår

ungefärengångperår

åtminståneengångomåret

nästanaldrig

Huroftareserduutomlandspåsemestern?

jagreserengångomåretutomlands

jagreserinteoftautomlandspåsemesterdetblirmeraiarbetet

jagreserreserutomlandspåsemestern

vartannatår

jagreserutomlandsengångpersemester

jagreserutomlandspåsemesterungefärengångperår

jagbrukarresautomlandspåsemesternåtminståneengångiåret

engångperårkanske

engångvartannatår

varjeår

varttredjeårungefär

nuförtideninteså

ofta

varjeårbrukarjagåkautomlands

Exempleofquestions

andansw

ers

Res

ults

no reus

e4%

reus

e 52

%

ellip

se18

%othe

r24

%

no answ

er2%

Less

ons

•su

bje

cts

adap

t th

eir

lexi

cal ch

oic

es t

o s

yste

m

ques

tions

•le

ss t

han

5%

of

the

case

s an

alter

nat

ive

main

ve

rb is

use

d in t

he

answ

er

•ad

aptive

lan

guag

e m

odel

and lex

icon in t

he

reco

gniz

er

Dia

log M

odel

Hum

an-

mach

ine

inte

ract

ion

•In

itia

tive

–sy

stem

/use

r•

Wh

o is

the

use

r–

Firs

t tim

e?•

Term

ino

log

y–

join

t vo

cabula

ry•

Do

yo

u a

ccep

t b

arg

e in

?–

Has

the

use

r under

stoo

d w

hat

was

sai

d?

•C

an

th

e u

ser

teach

th

e s

yst

em

?

Modalit

ies

•W

ho a

re y

ou t

alki

ng t

o–

syst

em–

Anim

ated

char

acte

r

•H

ow

is

the

info

rmat

ion p

rese

nte

d–

Tex

t, t

able

s, p

ictu

res

–Syn

thet

ic s

pee

ch

•Can

you b

oth

tal

k an

d p

oin

t

Page 11: Dialog systems at KTH

11

Spoke

ndia

log s

yste

m

•Fi

nite-

stat

ebas

edsy

stem

s–

dia

log a

nd s

tate

sex

plic

itly

spec

ifie

d

•Fr

ame

bas

edsy

stem

s–

dia

log s

epara

ted

from

info

rmat

ion s

tate

s

•Agen

t bas

edsy

stem

s–

mod

elof

inte

ntion

s, g

oals

, bel

iefs

Dia

log m

odel

•D

om

ain

dep

enden

t m

odel

–Rule

s, n

etw

orks

, st

ack

•Sep

arat

e m

odel

s fo

r th

e dia

log t

urn

s an

d t

he

sem

antics

–Fo

r ex

am

ple

Ques

tion

/answ

er

•Ref

eren

ce H

andlin

g

voic

eXM

L

http

://w

ww.v

oice

xml.o

rg/

Nuance

Voy

ager

Waxh

olm

Topic

s

TIM

E_

TA

BLE

Tas

k: g

et a

tim

e-ta

ble

.Exa

mple

: N

ärgår

båt

en?

(When

does

the

boat

lea

ve?)

SH

OW

_M

AP

Tas

k :

get

a c

har

t or

a m

ap d

ispla

yed.

Exa

mple

: Var

lig

ger

Vax

holm

? (W

her

e is

Vax

holm

loca

ted?)

EX

IST

Tas

k :

dis

pla

y lo

dgin

g a

nd d

inin

g pos

sibi

litie

s.Exa

mple

: Var

fin

ns

det

van

drar

hem

? (W

her

e ar

e th

ere

hos

tels

?)

OU

T_

OF_

DO

MA

INTas

k :

the

subje

ct is

out

of t

he

dom

ain.

Exa

mple

: K

an jag

boka

rum

. (C

an I

book

a ro

om?)

NO

_U

ND

ER

ST

AN

DIN

GTas

k :

no

under

stan

din

g o

f use

r in

tention

s.Exa

mple

: Ja

g h

eter

Olle

. (M

y nam

e is

Olle

)

EN

D_

SC

EN

AR

IOTas

k :

end a

dia

log.

Exa

mple

: Tac

k. (

Than

k yo

u.)

Dia

log

ue c

on

tro

l -

state

pre

dic

tio

n

Dia

log g

ram

mar

spec

ifie

d b

y a

num

ber

of st

ates

Eac

h s

tate

ass

oci

ated

with a

n a

ctio

ndat

abas

e se

arc

h,

syst

em q

ues

tion

… …

Pro

bab

le s

tate

det

erm

ined

fro

m s

eman

tic

feat

ure

sTra

nsi

tion p

robab

ility

from

one

stat

e to

sta

teD

ialo

g c

ontr

ol des

ign t

oolw

ith a

gra

phic

inte

rfac

e

Page 12: Dialog systems at KTH

12

Sem

antic

Fram

e

Currentfunctions:

/TO-PLACEQ-VERBALSUBJECTFROM-TIME/

Currentmeaning:/MOVEBOATPORTQUANT/

Historyfunctions:

/TO-PLACEQ-VERBALSUBJECTFROM-TIME/

Historymeaning:/MOVEBOATPORTQUANT/

(FROM-TIME.AFTER_TIME"04"//)

(FROM-TIME.BEFORE_TIME"06"//)

(SUBJECT"båten"/BOAT/)

(Q-VERBAL"går"/MOVE/)

(TO-PLACE"vaxholm"/PORT/)

proposedtopicTIME_TABLE

Topic

sel

ection

TIM

ES

HO

WFA

CILI

TYN

O U

ND

ER-

OU

T O

FEN

DT

ABL

EM

AP

STA

ND

ING

DO

MA

IN

OBJ

ECT

.062

.312

.073

.091

.067

.091

QU

EST

-WH

EN.188

.031

.024

.091

.067

.091

QU

EST

-WH

ERE

.062

.688

.390

.091

.067

.091

FRO

M-P

LACE

.250

.031

.024

.091

.067

.091

AT

-PLA

CE.062

.219

.293

.091

.067

.091

TIM

E.312

.031

.024

.091

.067

.091

PLA

CE.091

.200

.500

.091

.067

.091

OO

D.062

.031

.122

.091

.933

.091

END

.062

.031

.024

.091

.067

.909

HO

TEL

.062

.031

.488

.091

.067

.091

HO

STE

L.062

.031

.122

.091

.067

.091

ISLA

ND

.333

.556

.062

.091

.067

.091

PORT

.125

.750

.244

.091

.067

.091

MO

VE.875

.031

.098

.091

.067

.091

argmax

i{p(ti|F

)}

TOPI

C EX

AM

PLES

FEA

TURE

S

Topic

pre

dic

tion

res

ults

0510

15

All

“nounderstanding”

excluded

12,9

12,7

8,8

8,5

noextra

linguistic

sounds

raw

data

3,1

2,9

complete

parse

%Errors

How

may

I h

elp y

ou?

•Cal

lers

are

route

d t

o s

upport

sta

ff

usi

ng

Nat

ura

l Voic

es t

echnolo

gy,

AT&

T C

onsu

mer

Ser

vice

s' H

ow

May

I H

elp Y

ou?

(HM

IHY).

The

HM

IHY s

yste

m w

as d

eplo

yed in 2

001,

and b

y th

e en

d o

f th

e ye

ar,

it w

as h

andlin

g

more

than

2 m

illio

n c

alls

per

month

.

•Alle

n G

orin e

t al

.

Hum

an-h

um

an c

onve

rsations

Customer

Agent

Act

Freq.

Words

Freq.

Words

Acknowledge

47.9

2.3

30.8

3.1

Request

29.5

9.0

15.0

12.3

Confirm

13.1

5.3

11.3

6.4

Inform

5.9

7.9

27.8

12.7

Statement

3.4

6.9

15.0

6.7

Stat

isti

cs o

f tu

rns

in a

mov

ie d

omai

n (f

rom

Fla

mm

ia).

Conve

rsat

ional “g

runts

•Gruntsoccuranaverageofonce

every5secondsinAmericanEnglish

conversation.(NigelW

ard,2000)

•InSwitchboarddatabase

–um

wasthe6thmostfrequentitem(afterI,and,the,you,anda),(NigelWard,

2000)

–thefouritemsuh,uh-huhandum

andum

-hum

accountedfor4%ofthetotal

(Piconeetal.1998).

Page 13: Dialog systems at KTH

13

Nota

tion

abbreviationandfunction/position

back

back-channel

fill

filler,includingvariousthingsthatoccurutterance-

orturn-initially

dis

disfluency

marker

isisolate,producedwhenneitherpersonhastheturn,

typically

moreself-directedthanother-directed

rsresponse

todirectquestionorhigh-risestatement

cconfirmation,inresponse

toaback-channel

oother,includingclause-finalitems,itemsthatoccur

inquotations,anditemswhose

functionisobscure

Use

r st

udie

s

•Turn-taking

•Interaction

•PositiveandNegativeUserFeedback

•Userreactions

Posi

tive

and n

egat

ive

feed

back

2550

Individu

al u

sers

Utt

eran

ces

with

fee

dbac

k

Sys

tem

Use

rFe

edba

ck

This

bui

ldin

g wa

s co

nstr

ucte

d in

186

1Ye

s ye

sth

at’s

righ

t…..

is

the

re a

tile

d st

ove

ther

e to

oPo

sitive

Att

ention

This

apa

rtm

ent

has

a fi

repl

ace

Yes

that

’s a

ll righ

t to

o……

..

how

high

is t

he b

uild

ing

Positive

Att

itud

e

This

apa

rtm

ent

is o

n th

e fi

rst

floo

rOka

y….

and

I se

e it

is c

lose

to

the

Germ

an c

hurc

h th

erePo

sitive

Att

ention

I do

n’t k

now

any

thin

g ab

out

such

thi

ngs

Well ok

ay……

……….

ye

s bu

t I

thin

k I’

m h

appy

wit

h th

atNeg

ativeA

ttitud

e

This

hou

se w

as b

uilt

in 1

600

Oh!

Neg

ativeA

ttitud

e

94%

of t

he s

ubje

cts

used

fee

dbac

k at

leas

t on

ce65

%of

the

fee

dbac

k tu

rns

were

labe

led

as p

osit

ive

18%

of a

ll us

er u

tter

ance

s co

ntai

ned

feed

back

6%of

the

fee

dbac

k oc

curr

ed in

a s

epar

ate

turn

[%]

Para

met

er

sett

ings

to c

reat

e diffe

rent

stim

uli

Affirmativesetting

Negativesetting

Smile

Headsm

iles

Headhasneutralexpression

Headmovem

ent

Headnods

Headleansback

Eyebrows

Eyebrowsrise

Eyebrowsfrown

Eyeclosure

Eyescloseabit

Eyesopen

widely

F0contour

Declarativeintonation

Interrogativeintonation

Delay

Immediatereply

Slowreply

The

August

sys

tem

•Sto

ckholm

(eve

nts

and g

enera

l in

form

ation)

•Yello

w p

ages

•KTH

and s

pee

ch t

echnolo

gy

•August

Str

indberg

•G

reet

ings

and s

oci

al utt

era

nce

s•

Com

ments

about

the s

yste

m c

apabili

ties

and t

he

dis

cours

e

Shal

low

sem

antic

anal

ysis

•In

put

–w

ord s

equen

ces

–se

man

tic

feat

ure

s fr

om lex

icon

•O

utp

ut

–Acc

epta

ble

utt

eran

ce?

yes

/no

–Pr

edic

ted d

omai

n•

stri

ndberg

, st

ock

holm

, ye

llow

pag

es…

..

–Fe

ature

:val

ue

repre

senta

tion

•obje

ct:r

esta

ura

nt,

pla

ce:m

aria

torg

et

•Tra

ined

on t

agged

N-b

est

lists

and lex

icon

Page 14: Dialog systems at KTH

14

Theset-upin

Kulturhuset

Asamplevideoofthe

systemenvironment

The

August

data

bas

e

men 50

%

wom

en26

%

child

ren

24%

men 55%

wom

en23

%

child

ren

22%

2685

use

rs10

,058

utt

eran

ces

Sept

embe

r 19

98 -

Febr

uary

199

9:

10,0

58ut

tera

nces

(app

roxi

mat

ely 15

hou

rs o

f sp

eech

) we

re m

anua

lly c

heck

ed, t

rans

crib

ed a

nd a

naly

zed

What

do

you s

ay

to A

ugust

?

•Child

Wom

an 1

•W

om

an 2

Utterancetypesinthe

Augustdatabase

Socializing

Social

Insult

Test

Info-seeking

Domain

Meta

Facts

Soci

aliz

ing c

ate

gor

ies

Hel

lo A

ugus

t!Th

at’s

a ni

ce m

oust

ache

!W

ould

you

like

to

go o

ut w

ith

me

toni

ght?

You

are

stup

id!

Is y

our

brai

n to

o sm

all?

You

have

a s

ausa

ge b

rain

!

Wha

t is

my

nam

e?I

want

to

rent

a r

efri

gera

tor

Wha

t is

the

col

our

of y

our

hair

?

Social

Insu

lt

Test

Page 15: Dialog systems at KTH

15

The

info

-see

king c

ate

gor

ies

How

man

y bo

oks

did

Stri

ndbe

rg w

rite

?W

hat

can

you

stud

y at

KTH

?W

here

are

the

res

taur

ants

on

Kung

sgat

an?

Wha

t ca

n I

ask

you?

Augu

st a

nswe

r m

y qu

esti

on I

kno

w yo

u kn

ow

ever

ythi

ngTh

en I

will

spe

ak a

t th

e sa

me

tim

e as

I h

old

down

the

but

ton

-wha

t is

you

r na

me,

age

ntW

hat’s

the

cap

ital

of

Finl

and?

Wha

t is

two

tim

es t

wo?

How

man

y p

eopl

e liv

e in

Mad

rid?

Dom

ain

Met

a

Fact

s

Use

r utt

erance

cate

gor

ies

during t

he

firs

t si

x dia

logue

turn

s

cate

gory

child

ren

women

men

only

soc

ializ

ing

34%

20%

20%

only

info

-see

king

28%

39%

34%

from

soc

ializ

ing

to in

fo-s

eeki

ng

alte

rnat

ing

7%6%

3%

31%

35%

43%

The

stat

isti

cs a

re b

ased

on

the

firs

t ut

tera

nces

(up

to s

ix)

from

all

user

s th

at s

aid

mor

e th

an t

wo u

tter

ance

s to

the

sys

tem

What

……

.. ?

•334 u

tter

ance

s in

clude

“what

”–

only

75 h

ave

“what

” in

initia

l pos

itio

n

•99 “

what

is

your

nam

e”–

all in

fin

al u

tter

ance

pos

itio

n–

only

13 initia

te a

n u

tter

ance

intr

owh

at …

…...

An e

xam

ple

of

a re

pet

itiv

e se

quen

ce

The

utte

ranc

e ”V

ad h

eter

kun

gen?

” (W

hat

is t

he n

ame

of t

he k

ing?

) as

ori

gina

l inp

ut (t

op) a

nd r

epea

ted

twic

e by

the

sam

e us

er

Featu

res

in r

epet

itio

n

010

20

30

40

50

more

clearly

articulated

increased

loudness

shiftingof

focus

Percentageofallrepetitions

adults

children

Som

e le

ssons

for

reco

gnitio

n•

lexic

al en

train

men

t–

use

bot

h u

ser

input

and s

yste

m o

utp

ut

•ad

aptive

to

–ap

plic

atio

n–

use

r–

dia

log

•use

thre

e re

cognitio

n s

yste

ms

in p

aralle

l–

continuou

s sp

eech

(def

ault)

–w

ord b

y w

ord (

erro

r re

solu

tion

)–

continuou

s sy

llable

s (c

onfiden

ce)

Page 16: Dialog systems at KTH

16

TheAugustsystem

Wha

t is

you

r na

me?

I cal

l mys

elf

Stri

ndbe

rg,

but

I do

n’t

real

ly h

ave

a su

rnam

eW

hen

wer

e yo

u bo

rn?

Stri

ndbe

rg w

as b

orn

in 1

849

Wha

t do

you

do

for

a liv

ing?

I can

ans

wer

que

stio

ns a

bout

St

rind

berg

, KT

H a

nd S

tock

holm

How

man

y pe

ople

live

in

Stoc

khol

m?

Ove

r a

mill

ion

peop

le li

vein

the

Sto

ckho

lm a

rea

Do

you

like

it h

ere?

Peop

le w

ho li

ve in

gla

ss h

ouse

ssh

ould

not

thr

ow s

tone

s Ye

s, tha

tw

as a

sm

art

thin

g to

say

!I c

ome

from

the

dep

artm

ent

of S

peec

h, M

usic

and

Hea

ring

Th

e Ro

yal I

nsti

tute

of

Tech

nolo

gy!

The

info

rmat

ion

is s

how

non

the

map

Than

k yo

u!Yo

u ar

e w

elco

me!

Goo

d by

e!Pe

rhap

s w

e w

ill m

eet

soon

aga

in!

Yes,

it m

ight

be

that

we

will

!St

rind

berg

was

mar

ried

th

ree

tim

es!

Stri

ndbe

rg w

as m

arri

ed

thre

e ti

mes

!

Ver

bal

topic

aliz

atio

n

NP+

STRE

ETTh

e ap

artm

ent

on S

ankt

Eriks

gata

n-w

hich

flo

or is

it o

nST

REET

Öst

erlång

gata

n 24

-doe

s it

hav

e a

balc

ony

COLO

RTh

e gr

een

apar

tmen

t-d

oes

it h

ave

a ba

lcon

yDEI

CTIC

+STR

EET

This o

ne-o

n Helen

ebor

gsga

tan

-doe

s it

hav

e a

bath

tub

COLO

R+ST

REET

The

yello

w on

e on

Koc

ksga

tan

20-d

oes

it h

ave

a ba

lcon

y

5101520

COLO

RCO

LOR+

STRE

ETD

EICT

IC+

STRE

ETN

P+ST

REET

STRE

ET

Number of occurrences

Abo

ut 1

0% o

f al

l req

uest

s fo

r in

form

atio

n ab

out

a sp

ecif

ic

apar

tmen

t co

ntai

ned

a to

pica

lized

ref

eren

ce. 4

% c

onta

ined

a

verb

al t

opic

al r

efer

ence

and

6%

a p

rece

ding

gra

phic

al r

efer

ence

Gra

phic

al to

pic

aliz

ation

[clic

k] +

DEI

CTIC

Whe

n wa

s th

is o

nebu

ilt[c

lick]

+PR

ONOUN

Doe

s it

have

a t

iled

stov

e[c

lick]

+NP

Doe

s th

e ap

artm

ent

have

a b

atht

ub[c

lick]

+EL

LIPS

EW

hich

flo

or[c

lick]

+CO

LOR

Whe

n wa

s th

e wh

ite

hous

ebu

ilt[c

lick]

+ S

TREE

TD

oes Hor

nsga

tan

59ha

ve a

tile

d st

ove

5101520

[clic

k]+

COLO

R[c

lick]

+D

EICT

IC[c

lick]

+EL

LIPS

E[c

lick]

+N

P[c

lick]

+PR

ON

OU

N[c

lick]

+ST

REET

Number of occurrences

Tim

ing o

f m

ouse

clic

ks

124 3 Average time between mouse click and subsequent speech [sec]

[clic

k]+

COLO

R[c

lick]

+D

EICT

IC[c

lick]

+EL

LIPS

E[c

lick]

+N

P[c

lick]

+PR

ON

OU

N[c

lick]

+ST

REET

Time between mouse click and speech [sec]

Individu

al U

sers

befo

re

afte

r-50510

Cla

ssic

sys

tem

s

•Res

earc

h s

yste

ms

–Voy

ager

(1989)

–ATIS

(1992)

–SU

ND

IAL

(1993)

–TRAIN

S (

1996)

•Applic

atio

n–

Phili

ps

Tra

in I

nfo

rmation

(1995)

•La

rg E

ffort

s–

Com

munic

ator

–Ver

bm

obil

TRIP

S

JamesAllen

etal"Tow

ardsconversationalhuman-com

puterinteraction,"

AIMagazine,22(4),2001

TheBehavioralAgent

(BA)planssystem

behaviorbasedon

its

goalsandobligations,the

user’sutterancesand

actions,and

changesin

theworldstate.

TheInterpretation

Manager(IM)

interpretsuser

input.Itbroadcasts

therecognized

speech

actsand

increm

entally

updatesthe

DiscourseContext.

TheGenerationManager

(GM)plansthespecific

contentofutterancesand

displayupdates.

Page 17: Dialog systems at KTH

17

Use

r and s

yste

m m

odel

A n

ew m

etho

d fo

r di

alog

ue m

anag

emen

t in a

n inte

llige

nt s

yste

m f

or

info

rmat

ion

retr

ieva

l -

Kenj

i Abe

, Kaz

ushi

ge K

urok

awa,

Kaz

unar

i Ta

keta

, Sum

io O

hno

and

Hir

oya

Fujis

aki,

ICSL

P 20

00

Use

r and s

yste

m m

odel

Pla

tform

s

Waxh

olm

Sys

tem

GRAMMATIK

&SEMANTIK

DIALOGKONTROLL

GRAFIK

IGENKÄNNING

AKUSTISKOCHVISUELL

TALSYNTES

DATABAS-

SÖKNING

LJUD

“WIZARDOFOZ”

Kartoroch

tabeller

Båttidtabeller,

Hamnpositioner,

Hotell,

Restauranger,

mm.

INMATNING

UTMATNING

Kontextkänsliga

regleroch

nätverk

LEXIKON

Manuellsimulering

Inspelningar

Tal

DATABAS

Ord

Ordklasser

Semantiskinformation

Uttal

TAL

Flat

model

TTS

ASR

TTS

ASR

ASV

TTS

desktop

audio

animated

agent

SQL

datab.

ASR

ASR

ASV

TTS

audio

device

animated

agent

SQL

desktop

audio

audio

coder

application,dialogengine

Multi-

laye

rm

odel

componentAPIs

ASV

TTS

audio

device

animated

agent

SQL

componentinteraction

services

high-levelprimitives

dialogcomponents

speechtechnologyAPI

audio

coder

application,dialogengine

ASR

speech

detector

resourcelayer

application-independentlayer

application-dependentlayer

Page 18: Dialog systems at KTH

18

Pla

tform

support

TTS

ASR

TTS

ASR

ASV

TTS

desktop

audio

animated

agent

SQL

datab.

ASR

ASR

ASV

TTS

audio

device

animated

agent

SQL

speech-techAPI desktop

audio

audio

coder

Genericdialoguemanager-SesaME

ATL

AS

Serviceplatform

Commonplatform

Servicedescriptions

Speech

TechnologyAPI(ATLAS)

Black

board

DB-s

InteractionManager

Dialogue

Engine

System

task

models

A-Agent

A-Agent

A-Agent

Decision

Agent

Decision

Agent

Decision

Agent

Dialog

interpreter

Dialogdescription

activator

Servicedescription

collection

SesaM

E Domain

task

models

User

models

Open

ServiceEnvironmentAPI

Eva

luation

•Ph

onet

ic a

nal

ysis

•W

ord

under

stan

din

g S

ynth

esis

/Rec

ognitio

n•

Dom

ain D

epen

den

t–

Voc

abula

ry S

ynta

x

•Sys

tem

Fee

dbac

k•

“Tas

k co

mple

tion

”•

How

lon

g t

ime

•H

ow m

any

turn

s•

Hap

py

and s

atis

fied

use

rs

Eva

luation

effo

rts

•Som

e pro

ject

s–

NIS

T–

SAM

–CO

CO

SD

A–

EAG

LES

–D

ISC

NIS

T-e

valu

ations

•N

IST -

Nat

ional

Inst

itute

of

Sta

ndar

ds

and T

echnol

ogy

(USA)

–htt

p:/

/ww

w.n

ist.

gov

/spee

ch/

•Are

as–

Com

munic

ator

(In

telli

gen

t Con

vers

atio

nal

Inte

rfac

es,

2000-)

–Spee

ch R

ecog

nitio

n (

Engl

ish,

Span

ish,

Man

dar

in)

•bro

adca

st n

ews

(1996-1

999)

•co

nver

sational

tel

ephone

speec

h (

1997-)

–Top

ic D

etec

tion

and T

rack

ing (

1998-,

Engl

ish,

Man

dar

in)

–In

form

atio

n E

xtra

ctio

n -

Entity

Rec

ognitio

n (

1999-)

–Spok

en D

ocum

ent

Ret

riev

al (

1997-2

000)

–Spea

ker

Rec

ognitio

n (

1996-)

Wor

d a

ccura

cy

GN

FB

I

N=

−−

−100*

Gwo

rd a

ccur

acy

Nnu

mbe

r of

wor

dsF

num

ber

of w

rong

wor

dsB

num

ber

of m

issi

ng w

ords

Inu

mbe

r of

inse

rted

wor

ds

Not

sen

siti

ve t

o wo

rd s

imila

riti

esEx

:i k

väll

-ikv

äll,

jag

-ja,

Vax

holm

-Va

xhol

ms,

bil -

rest

aura

ng

equ

ally

wro

ng

Page 19: Dialog systems at KTH

19

Per

ple

xity

Bperp

lexi

tyH

entr

opy

P(W

)pro

bab

ility

of

the w

ord

sequence

in t

he u

sed lan

guag

e E

xam

ple

:N

um

ber

seq

uen

ces

B =

11,

if a

ll dig

its

equally

pro

bab

le

HPW

PW

W

=−

∀�(

)log

()

2

B=2H

DARPA-e

valu

atio

n1988-1

999

05

10

15

20

25

198819891990199119921993199419951996199719981999

RMcommands(1000ord)

ATISspontanous(2000ord)

WSJreadnews(20000ord)

NAB,BroadcastNews

Transrciption(60000ord)

Acc

urac

y(%

)

DIS

C

•Spoke

n L

anguag

e D

ialo

gue

Sys

tem

san

d C

om

ponen

ts:

Bes

t pra

ctic

e in

dev

elopm

ent

and e

valu

atio

n•

Par

tner

s an

d p

eople

–N

atu

ral In

tera

ctiv

e Sys

tem

s La

bora

tory

(N

IS),

Denm

ark

–C

entr

e N

ational de la R

ech

erc

he S

cien

tifique (

CN

RS-L

IMSI)

Fra

nce

–U

niv

ers

ität

Stu

ttgar

t, G

erm

any

–Kunglig

a T

ekn

iska

Högsk

ola

n (

KTH

), S

weden

–Voca

lis L

td,

Engla

nd

–D

aim

ler-

Chry

sler

AG

, G

erm

any

–ELS

NET,

Euro

pe

•U

RL:

ww

w.d

isc.

dk

Use

r pro

file

Emailretrieval

Kamm,LitmanWalkerICSLP98

10

100

1000

Time

UserTurns

RecognitionScore

NovicewithoutTutorial

NovicewithTutorial

Expert

Gen

eral

Model

s of

Usa

bili

ty w

ith P

ARAD

ISE

Marily

n W

alk

er,

Candac

e Kam

m a

nd D

iane

Litm

an

Para

dis

eU

ser

Satisf

action

•I

found t

he

syst

em e

asy

to u

nders

tand in t

his

co

nve

rsat

ion.

(TTS P

erf

orm

ance

)•

In t

his

con

vers

atio

n,

I kn

ew w

hat

I c

ould

say

or

do

at

each

poi

nt

of t

he

dia

logue.

(U

ser

Exp

erti

se)

•The

syst

em w

orke

d t

he

way

I ex

pec

ted it

to in t

his

conve

rsat

ion.

( Exp

ecte

d B

ehav

iour)

•Base

d o

n m

y ex

per

ience

in t

his

con

vers

atio

n u

sing t

his

syst

em t

o get

tra

vel in

form

atio

n,

I w

ould

lik

e to

use

th

issy

stem

reg

ula

rly.

(Fu

ture

Use

)

Page 20: Dialog systems at KTH

20

Eva

luation

met

rics

•D

ialo

g E

ffic

iency

Met

rics

: Tot

al e

lapse

d t

ime,

Tim

e on

tas

k,Sys

tem

turn

s, U

ser

turn

s, T

urn

s on

tas

k, t

ime

per

turn

for

eac

hsy

stem

mod

ule

Eva

luation

met

rics

•D

ialo

g E

ffic

iency

Met

rics

: Tot

al e

lapse

d t

ime,

Tim

e on

tas

k,Sys

tem

turn

s, U

ser

turn

s, T

urn

s on

tas

k, t

ime

per

turn

for

eac

hsy

stem

mod

ule

•D

ialo

g Q

ual

ity

Met

rics

: W

ord A

ccura

cy, Sen

tence

Acc

ura

cy,

Mea

n

Res

pon

se lat

ency

, Res

pon

se lat

ency

var

iance

Eva

luation

met

rics

•D

ialo

g E

ffic

iency

Met

rics

: Tot

al e

lapse

d t

ime,

Tim

e on

tas

k,Sys

tem

turn

s, U

ser

turn

s, T

urn

s on

tas

k, t

ime

per

turn

for

eac

hsy

stem

mod

ule

•D

ialo

g Q

ual

ity

Met

rics

: W

ord A

ccura

cy, Sen

tence

Acc

ura

cy,

Mea

n

Res

pon

se lat

ency

, Res

pon

se lat

ency

var

iance

•Tas

k Succ

ess

Met

rics

: Pe

rcei

ved

task

com

ple

tion

, Exa

ctSce

nar

io

Com

ple

tion

, Any

Sce

nar

io C

omple

tion

Eva

luation

met

rics

•D

ialo

g E

ffic

iency

Met

rics

: Tot

al e

lapse

d t

ime,

Tim

e on

ta

sk,

Sys

tem

turn

s, U

ser

turn

s, T

urn

s on t

ask,

tim

e per

turn

for

each

syst

em m

odule

•D

ialo

g Q

ual

ity

Met

rics

: W

ord

Acc

ura

cy,

Sen

tence

Acc

ura

cy,

Mea

n R

espon

se lat

ency

, Res

pon

se lat

ency

va

rian

ce•

Tas

k Succ

ess

Met

rics

: Pe

rcei

ved t

ask

com

ple

tion

, Exa

ctSce

nar

io C

omple

tion

, Any

Sce

nar

io C

omple

tion

•U

ser

Sat

isfa

ctio

n:

Sum

of

TTS p

erfo

rmance

, Tas

k ea

se,

Use

rex

per

tise

, Exp

ecte

d b

ehavi

our,

Futu

re u

se.

Com

munic

ato

r: U

ser

Satisf

action

Tas

k co

mple

tion

Page 21: Dialog systems at KTH

21

Tas

k dura

tion

Pred

iction

of sa

tisf

act

ion?

PERFORMANCE=.25MRS+.33COMP-.33HELP

MRS

=meanrecognitionscore

COMP

=perceivedcompletion

HELP

=numberofhelpmessages

PERFORMANCE=Usersatisfaction

Covers41.3%ofthevariance

Som

e Chal

lenges

•D

ialo

g M

odel

ing

–st

atis

tica

l?

•In

itia

tive

–co

nve

rsat

ion

•Err

or

Han

dlin

g•

Multid

om

ain

•U

ser

model

ling –

Adap

tivi

ty•

Turn

Tak

ing

•M

ultim

odal Com

munic

atio

n

Dia

log M

anagem

ent

in M

IMIC

•In

itia

tive

model

ing

–dis

trib

ution

of

syst

em initia

tive

s

•G

oal

sel

ection

–goa

l th

at t

he

syst

em a

ttem

pts

to

reac

h

•Str

ateg

y se

lect

ion

–dia

log a

cts

dep

endin

g o

nin

itia

tive

dis

trib

ution

MIM

IC: A

n A

dapt

ive

Mix

ed I

niti

ativ

e Sp

oken

Dia

logu

e Sy

stem

for

In

form

atio

n Q

ueri

es J

enni

fer

Chu-

Carr

oll,

NA

ACL

200

0

Initia

tive

-Cue

det

ection

•D

isco

urs

e cu

es–

Tak

eOve

rTas

k•

when

use

r giv

es m

ore

info

than

nee

ded

–N

oN

ewIn

form

atio

n•

no

pro

gre

ss t

ow

ards

task

com

ple

tion

Adap

tation o

f th

e dia

log

•Eva

luat

e th

e dia

log c

ontinuousl

y–

Do

the

syst

em a

nd t

he

use

r have

the

sam

e goa

l–

Who

take

s th

e in

itia

tive

•Err

or

han

dlin

g–

Anal

ysis

and r

epai

r

Page 22: Dialog systems at KTH

22

Err

or

handlin

g in W

axh

olm

Inpu

t

Robu

st p

arse

Ans

wer

OK

Out

put

“I d

o no

t un

ders

tand

Out

put

“Thi

s is

wha

t I

unde

rsto

od….

.”

Pred

ict

topi

c. A

gree

?

Full

pars

e

1. lo

op2.

loop

Yes

No

The

HIG

GIN

S d

om

ain

•The p

rim

ary

dom

ain

of H

IGG

INS is

city

navi

gation for

ped

estr

ians.

Seco

ndarily

, H

IGG

INS is

inte

nded t

o p

rovi

de s

imple

info

rmation a

bout

the

imm

edia

te s

urr

oundin

gs.

Thisisa

3Dtest

environment

The

HIG

GIN

S d

omai

n

U:

Jag h

ar

en s

tor

byggnad

till

vänst

er

S:

Vilk

en f

ärg

har

den b

yggnaden?

U:

Ora

nge

S:

Bes

kriv

något

mer

U:

Jag h

ar

en g

lasb

yggnad

fram

för

mig

S:

Stä

lldig

mella

nden o

ch e

n t

räbyg

gnad

Err

ors

in s

poke

n d

ialo

gue

syst

ems

•W

hat

is

an e

rror?

–A d

evia

tion

fro

m a

n e

xpec

ted o

utp

ut

from

a

syst

em,

mod

ule

or

pro

cess

•D

evia

tion f

rom

what

?–

What

is

wri

tten

in t

he

requir

emen

t sp

ecific

atio

n

–W

hat

a h

um

an “

wiz

ard”

wou

ld d

o–

What

max

imis

es u

ser

satisf

action

•U

sers

nev

er m

ake

erro

rs in t

his

sen

se!

–D

isfluen

ces,

etc

, is

just

anot

her

beh

avio

ur

the

syst

em s

hou

ld h

andle

Err

or h

andlin

g r

esea

rch iss

ues

Userutterance

User

reaction/repair

Assume

understanding

Norecovery

Non-understanding

Assume

understanding

Err

or r

ecov

ery

(Non

-und

erst

andi

ng)

Ear

ly e

rror

det

ecti

on

Gro

undi

ng

Late

err

or d

etec

tion

Err

or r

ecov

ery

(Mis

unde

rsta

ndin

g)

Misunderstanding

U:

jag

trä

dh

ar

hej

en

sto

r

byg

gn

ad

en

till

vän

ster

S:

Vilk

en f

ärg

har

den

byg

gnad

en?

U:

ora

ng

e v

än

ster

Initia

l ex

per

imen

ts

•Stu

die

s on h

um

an-h

um

an c

onve

rsat

ion

•The

Hig

gin

s dom

ain (

sim

ilar

to M

ap T

ask)

•U

sing A

SR in o

ne

direc

tion t

o e

licit e

rror

han

dlin

g b

ehav

iour

Vocoder

User

Operator

Listens

Speaks

Reads

Speaks

ASR

Page 23: Dialog systems at KTH

23

Non-u

nder

standin

g e

rror

rec

ove

ry

•Res

ults

show

that

hum

ans

tend n

ot t

o si

gnal

non

-under

stan

din

g:

O:

Do

you s

ee a

wood

en h

ouse

in f

ront

of y

ou?

U:

YES

CR

OS

SIN

G A

DD

RES

S N

OW

(I p

ass

the

wood

en h

ouse

now

)O

: Can

you

see

a r

esta

ura

nt

sign?

•This

lea

ds

to–

Incr

ease

d e

xper

ience

of

task

succ

ess

–Fa

ster

rec

over

y fr

om n

on-u

nder

stan

din

g

•Ska

ntz

e, G

. (2

003).

Exp

loring h

um

an e

rror

han

dlin

g s

trat

egie

s: im

plic

atio

ns

for

spoke

n

dia

logue

syst

ems.

Err

or h

andlin

g r

esea

rch iss

ues

Userutterance

User

reaction/repair

Assume

understanding

Norecovery

Non-understanding

Assume

understanding

Err

or r

ecov

ery

(Non

-und

erst

andi

ng)

Ear

ly e

rror

det

ecti

on

Gro

undi

ng

Late

err

or d

etec

tion

Err

or r

ecov

ery

(Mis

unde

rsta

ndin

g)

Misunderstanding

Err

or h

andlin

g r

esea

rch iss

ues

Userutterance

User

reaction/repair

Assume

understanding

Norecovery

Non-understanding

Assume

understanding

Err

or r

ecov

ery

(Non

-und

erst

andi

ng)

Ear

ly e

rror

det

ecti

on

Gro

undi

ng

Late

err

or d

etec

tion

Err

or r

ecov

ery

(Mis

unde

rsta

ndin

g)

Misunderstanding

Early

erro

r det

ection

•The

syst

em m

ust

under

stan

d w

hat

it

does

n’t

under

stan

d–

Mea

sure

of

confiden

ce in its

under

stan

din

g•

Det

erm

ines

gro

undin

g b

ehav

iour

•Fa

cilit

ates

lat

e er

ror

det

ection

–D

ecid

ing w

hen

to

reje

ct a

nd w

hen

to

acc

ept

whol

e utt

eran

ces

or

part

s of

utt

eran

ces

•Shou

ld d

epen

d o

n–

Con

fiden

ce o

f under

stan

din

g–

Con

sequen

ce o

f non

-under

stan

din

g–

Con

sequen

ce o

f m

isunder

stan

din

g

Curr

ent

rese

arc

h

Userutterance

User

reaction/repair

Assume

understanding

Norecovery

Non-understanding

Assume

understanding

Err

or r

ecov

ery

(Non

-und

erst

andi

ng)

Ear

ly e

rror

det

ecti

on

Gro

undi

ng

Late

err

or d

etec

tion

Err

or r

ecov

ery

(Mis

unde

rsta

ndin

g)

Misunderstanding

Hig

gin

s dom

ain

-dem

onst

ration

Page 24: Dialog systems at KTH

24

CH

IL"C

om

pute

rs in t

he

Hum

an inte

ract

ion L

oop"

•In

tegra

ted P

roje

ct u

nder

the

Euro

pea

n

Com

mis

sion

's S

ixth

Fra

mew

ork

Pro

gra

mm

e.

•Coo

rdin

ated

by

Univ

ersi

tät

Kar

lsru

he

(TH

) an

d t

he

Frau

nhof

er I

nst

itute

IIT

B.

•CH

IL w

as lau

nch

ed o

n J

anuar

y, 1

st 2

004.

htt

p:/

/chil.

serv

er.d

e/

•D

aim

lerC

hry

sler

AG

, G

roup D

ialo

gue S

yste

ms,

Germ

any

•ELD

A,

Eva

luations

and L

anguage r

eso

urc

es D

istr

ibution A

gen

cy,

Fr

ance

•IB

M C

eska

Rep

ublik

a,

Jzec

h R

epublic

•RESIT

, Rese

arc

h a

nd E

duca

tion S

oci

ety

in I

nfo

rmation T

ech

nolo

gie

s,

Gre

ece

•IN

RIA

(In

stitut

National de R

ech

erc

he e

n I

nfo

rmatique

et

en

Auto

mat

ique),

Pro

ject

GRAVIR

, Fr

ance

•IR

ST (

Inst

ituto

Tre

ntino

diC

ultura

), I

taly

•KTH

(KunglTekn

iska

Högsk

ola

n),

Sw

eden.

•C

NRS,

LIM

SI

(Centr

e N

ational de la R

ech

erc

he S

cientifique t

hro

ugh

its

Labora

toire

d'Info

rmatique

pour

la m

écaniq

ue

et

les

scie

nce

s de

l'ingénie

ur)

, Fr

ance

•TU

E (

Tech

nis

che

Univ

ers

iteit

Ein

dhove

n),

The N

eth

erlands

•IP

D,

Univ

ersi

tät

Karlsr

uhe (

TH

) th

rough its

Inst

itute

IPD

, G

erm

any

•U

PC,

Univ

ersi

tat

Polit

ècnic

ade C

ata

lunya

, Spain

•U

niv

ers

ität

Karlsr

uhe (

TH

), I

nte

ract

ive

Sys

tem

s La

bs,

Germ

any

•Fr

aunhofe

r In

stitut

für

Info

rmations-

und D

ate

nve

rarb

eitung

(IIT

B),

Karlsr

uhe,

Germ

any

•Sta

nfo

rd U

niv

ersi

ty,

Clif

ford

Nass

, U

SA

•C

MU

, C

arn

egie

Mello

n U

niv

ers

ity,

USA

Challe

nge

•The

obje

ctiv

e–

to c

reat

e en

viro

nm

ents

in w

hic

h c

ompute

rs s

erve

hum

ans

who

focu

s on

inte

ract

ing w

ith o

ther

hum

ans

as o

ppos

ed t

o hav

ing

to a

tten

d t

o an

d b

eing p

reoc

cupie

d w

ith t

he

mac

hin

es t

hem

selv

es.

–In

stea

d o

f co

mpute

rs o

per

atin

g in

an iso

late

d m

anner

, an

d H

um

ans

in t

he

loop

of

com

pute

rs w

e w

ill p

ut

Com

pute

rs in t

he

Hum

an I

nte

ract

ion L

oop (

CH

IL).

•Com

pute

r Ser

vice

s–

mod

els

of h

um

ans

and t

he

stat

e of

thei

r ac

tivi

ties

and

inte

ntion

s. B

ased

on t

he

under

stan

din

g o

f th

e hum

an

per

ceptu

al c

onte

xt,

CH

IL c

ompute

rs a

re e

nab

led

to

pro

vide

hel

pfu

l as

sist

ance

im

plic

itly

, re

quirin

g a

min

imum

of

hum

an a

tten

tion

or

inte

rrupt

ions

CH

IL -

Ser

vice

s

•M

emory

Jog

(M

J).

–It

hel

ps

the

atte

ndee

s by

pro

vidi

ng info

rmat

ion r

elat

ed t

o th

e dev

elop

men

t of

the

even

t (m

eeting/l

ectu

re)

and t

o th

e par

tici

pan

ts.

MJ

pro

vides

con

text

-an

d c

onte

nt-

awar

e in

form

atio

n p

ull

and p

ush

, bot

h p

erso

nal

ized

and p

ublic

.

•Att

ention

Coc

kpit (

AC).

AC m

onitor

s th

e at

tention

and

inte

rest

lev

el o

f par

tici

pan

ts, su

ppor

ting indiv

idual

s w

ho

wan

t m

ore

or les

s in

volv

emen

t in

the

dis

cuss

ion.

It c

an a

lso

info

rm t

he

Soc

ially

-Suppor

tive

Wor

kspac

es a

bou

t th

e at

tention

al

stat

e of

the

par

tici

pan

ts.

•Con

nec

tor

(Con

nec

tor)

. –

Con

text

-aw

are

connec

ting s

ervi

ces

ensu

re t

hat

tw

o par

ties

ar

e co

nnec

ted w

ith e

ach o

ther

at

the

right

pla

ce,

tim

e an

d

by

the

bes

t m

edia

, w

hen

it

is m

ost

appro

pri

ate

and

des

irab

le for

bot

h p

arties

to

be

connec

ted.

Ljuddes

ign för

talg

ränss

nitt

•U

tvec

klin

g a

v et

t sy

stem

för

jourh

avan

de

fast

ighet

sskö

tare

•H

ur

ser

en f

astighet

sskö

tard

om

än u

t?–

Kom

plic

erad

! -

Käl

lare

med

mån

ga

rum

•va

tten

kran

ar•

elsk

åp•

Ven

tila

tionsa

nlä

ggn

ingar

KorridorB1

Huvudkranrum

D3

VarmvattenrumB

4HuvudfläktrumA1

Sektionslufts-rum

C2

Huvud-

elcentralB2

ElrumD4

SäkringsrumB3

KorridorD1

KorridorA3

Sektionsvatten-

rumC1

VattenrumA2

Luftkylnings-

centralD2

Elmätarcentral

C0

KorridorC4

Ventilations-rum

C3

Kom

pliceraddomän!

Ljud

desi

gn f

ör t

algr

änss

nitt

Page 25: Dialog systems at KTH

25

•Sys

tem

et ä

r an

vändar

ens

”ögon

och

hän

der

”, t

.ex.

:

S>DubefinnerdigikorridorC4.Härifrånkandugåtillföljande

rum:elrumD3,sektionsvattenrumC3.Vad

villdu

göra?

A>

gåtillsektionsvattenrumC3

S>DubefinnerdigisektionsvattenrumC3.Härifrånkandugå

tillföljanderum:korridorC4,elmätarcentralC2,ventilations-

rumE3.Vad

villdugöra?

A>

tittapåvattenkranarna

S>Dutittarpåsektionsvattenkranarna.Följandekranaräröpp-

na:kran

2,kran

7.Restenavkranarnaärstängda.Vad

villdu

göra?Ljuddes

ign för

talg

ränss

nitt

•H

ur

kan lju

d a

nvä

ndas

som

nav

igat

ionss

töd?

•Auditiv

a ik

oner

–tv

å ty

per

för

ekom

mer

i t

vå o

lika

rolle

r•

bak

gru

nds

ljud (

konte

xtgiv

are)

åter

kopplin

g på

kom

man

do

–[e

xem

pel

dial

og u

tan lju

d]

–[e

xem

pel

dial

og m

ed lju

d]

Ljuddes

ign

för

talg

ränss

nitt

The

End