1
Two types of conversational structures are considered: Obligations and Common Ground. The dialogue acts contribute to conversations as "charges" or "credits" in at least one of the two main structures. Any charge should be credited to balance the transaction and continue with the next one until the task is finished. The empirical base of DIME- DAMSL scheme is the Corpus DIME. It was produced to analize dialogue acts in practical dialogues. It is conformed by 26 dialogues in Spanish. The user designs a kitchen giving spoken instructions to the system. DIME-DAMSL 1998-2006 The DIME group designed and produced a large speech corpus to create acoustic models for Mexican Spanish. The Corpus DIMEx100 is composed by 6,000 sentences (between 5 and 15 words each) recorded by 100 speakers. Each sentence was analyzed with Mexbet, a phonetic alphabet, and tagged in multiple levels: -T22 (phonemes) -T44 (allophones gross) -T54 (allophones fine) -TP (words) A Speech Recognition System for Mexican Spanish was created with the final tagging and Sphinx algorithm. DIMEx100 2003-2006 GOLEM Golem's debut was at UNAM’s Science Museum Universum in 2007. It was widely covered by the Mexican TV, radio, and press. It made several demonstrations in academic events in Mexico during 2008 and 2009, before retiring. 2001-2009 GUESS THE CARD 2009-2011 GOLEM-II+ 2010-2011 Golem's capabilities were tested with the "Guess the card" game. This fixed application is a permanent exhibition at the Universum Museum. This system presents Artificial Intelligence technologies to the general public. The interaction is carried out in spoken Spanish and visual interpretations. Golem-II+ is the group's newest service robot. As the previous version, It can also guide a poster session. In addition, it is able to recognize pointing gestures, to navigate through busy rooms, and to audio-locate its user; further more, now includes in its framework the tests of the Robocup@Home competition. Golem-II+ is based on a cognitive architecture named IOCA (Interaction-Oriented Cognitive Architecture). DIME-DAMSL is a theory, inspired by DAMSL, about how practical dialogues or task-oriented conversations are structured. In DIME-DAMSL, a practical dialogue not only transmits information about the task, but also manages the task and the dialogue itself. Practical dialogues are series of transactions. In each transaction, obligations are created to reach a specific goal of the task. To achieve this, the levels of agreement and understanding between conversational agents are negotiated. DIME 1998-2009 The DIME group was a research team that focused on the development of a theory of conversation with its computer implementation, an infrastructure for the construction of spoken Spanish recognition systems, and a flexible interaction-oriented architecture, that can be embodied on different hardware platforms, for the development of applications in diverse domains. This project gave the name to the Golem group. The main interest of the research team is the development of multimodal interaction systems for fixed and mobile platforms. Golem was the first implementation of a multimodal system in a service robot. It was able to guide a poster session through simple spoken conversation in Spanish, and to move to the selected poster. The system was a set of computational agents; each one representing a modality of information. Departamento de Ciencias de la Computación THE GOLEM GROUP 1998-2011 Luis A. Pineda and the Golem Group Computer Science Department Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas Universidad Nacional Autónoma de México [email protected] http://golem.iimas.unam.mx is ls 1 ε:ra 1 (tour) ls 2 fs ok:ra 2 ([ai,pr,pl ]) no:ra 3 (tour) no:ra 3 (tour) rs 1 rs 2 rs 3 ai:ra 4 (ai) pr:ra 4 (pr) pl:ra 4 (pl) ε:ra 5 (ai) ε:ra 5 (pr) ε:ra 5 (pl) ls 3 no:ra 3 (tour) ls 2 ok:w rs is ls 1 ls 2 fs empty:ra 1 ([per,area,proy]) no:ra 2 (error) no:ra 4 (ai) ε:ra 3 (per) no:ra 3 (ai) is ok:w help area:ra 2 (area) ε ts 1 ts 2 ts 3 per:ra 2 (per) proy:ra 2 (proy) ε:ra 3 (area) ε:ra 3 (proy) ls 1 Dialogue Manager Main DM Subordinated DM DM = Dialogue Model Expected Intentions Intentions Filter Speech Recognition I want to visit the AI poster... Display Image or Video Navigation Synthesizer text The developments in AI... Recognized Intention Dialogue Manager Situation Interpretation Multimodal Rhetorical Act Interpretation of Basic Rhetorical Acts Basic Rhetorical Acts (Modal Specific) Output Devices Golem: Hello, do you want to play? Player: Yes, please. Golem: Yes, it's round. Player: Is it round? ... Golem: No, sorry, it's not a planet. You can ask me again, but the question session is almost finished. Player: Is it a planet? Golem: Yes, it's yellow. Now, show me the card you think it is. Player: Is it yellow? Golem: I can't see very well, but I think you card is this one. (The system displays an image on the screen) Player: (The user puts the card in front the camera) Golem: You won! I chose the card of the Sun. Player: Yes, that's the card I think you chose. utterance 1. u: after that <sil> can you put <sil> the cooker hood on the top of the <sil> of the stove 2. s: okay 3. s: <move-obj> 4. s: is this okay? 5. u: yes, it-s okay obligations common ground charge credit charge credit charge credit agreement under- standing dialogue acts action-dir commit, accept move-obj info-request answer, accept 1 1 1 2 2 1 3 3 4 4 4 4 Video Controler Dialogue Manager & Facilitator Image Controler Speech Recognition Agent Speech Synthesizer Agent Navigation Agent Golem's Agents Architecture Voice

The Golem Group (1998-2011)

Embed Size (px)

DESCRIPTION

The DIME group was a research team that focused on the development of a theory of conversation with its computer implementation, an infrastructure for the construction of spoken Spanish recognition systems, and a flexible interaction-oriented architecture, that can be embodied on different hardware platforms, for the development of applications in diverse domains.

Citation preview

Page 1: The Golem Group (1998-2011)

Two types of conversational structures are considered: Obligations and Common Ground.

The dialogue acts contribute to conversations as "charges" or "credits" in at least one of the two main structures.

Any charge should be credited to balance the transaction and continue with the next one until the task is finished.

The empirical base of DIME-DAMSL scheme is the Corpus DIME. It was produced to analize dialogue acts in practical dialogues. It is conformed by 26 dialogues in Spanish. The user designs a kitchen giving spoken instructions to the system.

DIME-DAMSL 1998-2006

The DIME group designed and produced a large speech corpus to create acoustic models for Mexican Spanish. The Corpus DIMEx100 is composed by 6,000 sentences (between 5 and 15 words each) recorded by 100 speakers. Each sentence was analyzed with Mexbet, a phonetic alphabet, and tagged in multiple levels:

-T22 (phonemes)-T44 (allophones gross)-T54 (allophones fine)-TP (words)

A Speech Recognition System for Mexican Spanish was created with the final tagging and Sphinx algorithm.

DIMEx100 2003-2006

GOLEM

Golem's debut was at UNAM’s Science Museum Universum in 2007. It was widely covered by the Mexican TV, radio, and press. It made several demonstrations in academic events in Mexico during 2008 and 2009, before retiring.

2001-2009

GUESS THE CARD 2009-2011

GOLEM-II+ 2010-2011

Golem's capabilities were tested with the "Guess the card" game. This fixed application is a permanent exhibition at the Universum Museum. This system presents Artificial Intelligence technologies to the general public. The interaction is carried out in spoken Spanish and visual interpretations.

Golem-II+ is the group's newest service robot. As the previous version, It can also guide a poster session. In addition, it is able to recognize pointing gestures, to navigate through busy rooms, and to audio-locate its user; further more, now includes in its framework the tests of the Robocup@Home competition. Golem-II+ is based on a cognitive architecture named IOCA (Interaction-Oriented Cognitive Architecture).

DIME-DAMSL is a theory, inspired by DAMSL, about how practical dialogues or task-oriented conversations are structured.

In DIME-DAMSL, a practical dialogue not only transmits information about the task, but also manages the task and the dialogue itself.

Practical dialogues are series of transactions. In each transaction, obligations are created to reach a specific goal of the task. To achieve this, the levels of agreement and understanding between conversational agents are negotiated.

DIME 1998-2009

The DIME group was a research team that focused on the development of a theory of conversation with its computer implementation, an infrastructure for the construction of spoken Spanish recognition systems, and a flexible interaction-oriented architecture, that can be embodied on different hardware platforms, for the development of applications in diverse domains.

This project gave the name to the Golem group. The main interest of the research team is the development of multimodal interaction systems for fixed and mobile platforms. Golem was the first implementation of a multimodal system in a service robot. It was able to guide a poster session through simple spoken conversation in Spanish, and to move to the selected poster. The system was a set of computational agents; each one representing a modality of information.

Departamento de Ciencias de la Computación

THE GOLEM GROUP1998-2011

Luis A. Pineda and the Golem GroupComputer Science Department

Instituto de Investigaciones en Matemáticas Aplicadas y en SistemasUniversidad Nacional Autónoma de México

[email protected]://golem.iimas.unam.mx

is

ls1

ε:ra1 (tour) ls2

fs

ok:ra2([ai,pr,p

l ])

no:ra3(tour)

no:ra3 (tour)

rs1

rs2

rs3

ai:ra4(ai)

pr:ra4(pr)

pl:ra4 (pl)

ε:ra5 (ai)

ε:ra 5

(pr)

ε:ra5(pl)ls3

no:ra 3(to

ur)

ls2

ok:w

rs

is

ls1

ls2

fs

empty:ra1([per,area,proy])

no:ra2(error)

no:ra4(ai)

ε:ra3 (per)

no:ra3 (ai)

isok:w

help

area:ra2(area)

ε

ts1

ts2

ts3

per:ra2(per)

proy:ra2 (proy)

ε:ra3(area)

ε:ra3(proy)

ls1

Dialogue ManagerMain DM Subordinated DM

DM = Dialogue Model

Exp

ected

Inten

tion

s

Intentions Filter

Speech Recognition

I want to visit the AI poster...

Display Image or Video

Navigation

Synthesizer

text The developments in AI...

RecognizedIntention

Dialogue Manager

Situation Interpretation

Multimodal Rhetorical

Act

Interpretationof Basic

RhetoricalActs

Basic Rhetorical Acts(Modal Specific)

Output Devices

Golem: Hello, do you want to play?

Player: Yes, please.

Golem: Yes, it's round.

Player: Is it round?

...

Golem: No, sorry, it's not a planet. You can ask me again, but the question session is almost finished.

Player: Is it a planet?

Golem: Yes, it's yellow. Now, show me the card you think it is.

Player: Is it yellow?

Golem: I can't see very well, but I think you card is this one. (The system displays an image on the screen)

Player: (The user puts the card in front the camera)

Golem: You won! I chose the card of the Sun.

Player: Yes, that's the card I think you chose.

utterance

1. u: after that <sil> can you put <sil> the cooker hood on the top of the <sil> of the stove

2. s: okay

3. s: <move-obj>

4. s: is this okay?

5. u: yes, it-s okay

obligations common ground

charge credit charge credit charge credit

agreement under-standing

dialogueacts

action-dir

commit,accept

move-obj

info-request

answer,accept

1 1

12

2

13

3

4 4

44

Video Controler

Dialogue Manager &

Facilitator

Image Controler

Speech Recognition Agent Speech Synthesizer

Agent

NavigationAgent

Golem's Agents Architecture

Voice