31
Adding Conversation to GUIs Dekang Lin Naturali 1

Dekang Lin at AI Frontiers: Adding Conversation to GUIs

Embed Size (px)

Citation preview

Adding Conversation to GUIs

Dekang LinNaturali

1

A Tale of Two Uber Ridesuber ride to

crowne plaza sfo

Naturali

A Beijing-based startup company

Upgrade apps with a speech interface

Naturali Sesami✦ Translate speech inputs to action sequences

in apps and execute them on users’ behalf.

✦ Chinese version launched on LeTV phones as a system app on April 12, 2017

✦ Available as a third party app all Android phones since Aug. 2017

Advantages of Speech

Speed✦ voice input is three times as fast as typing

Hand-free:✦ send messages, play music, order food

✦ turn on hotspot: 5 clicks

Mind-free:✦ where is my luggage?

Voice Assistants

Chat window

Fulfillment by backend API calls

Chat + API: the down sides

Chat assistants displace apps, but

Chat is not the best mode of interaction for everything.

editing

browsing

viewing

None the less, there are plenty of needs for voice interaction.

who has access to

this?

Who has access? Just ask

Chat + API: the down sides

Re-invention of user experience inside the chat window:✦ usually not as good as

specialized apps,

✦ requires a great deal of repeated development effort

Chat + API: the down sides

Re-invention of user experience inside the chat window:✦ usually not as good as

specialized apps,

✦ requires a great deal of development effort

Chat + API: the down sides

Economic interests of the assistant and the backend services may not be aligned.

Naturali Sesami

A thin, transparent translation layer over apps.✦ voice ➜ front end UI actions

Seamless integration of speech and graphics✦ Existing GUI interactions are still

available

✦ Making voice interaction available on any app page

Use Yelp to find greek food near Santa Clara Convention Center

Voice to Actions in Three Steps

Speech Recognition: sound → text✦ data

Semantic Interpretation: text → intent✦ knowledge

Plan Generation: intent → actions✦ grounding

Speech Recognition: sound → text

Third party services

Open source tools

Naturali Speech

End-to-end DNN: CNN+LSTM+Attention+CTC✦ built from scratch with TensorFlow

✦ trained with thousands of hours of transcribed speech

Personalized and contextualized language model:✦ contact names

✦ app specific vocabulary

Semantic Interpretation: text → intent

An intent identifies a task and the necessary information (parameters) for the task

Example: ✦ task: FlightSearch

✦ parameters: (to, from, date, airline, class)

Entities and Types

Persons: singers/directors/contacts

Locations: cities/POIs/addresses

Apps and Games

Media: songs/shows/movies/books

Time and Date

Food

Sports teams

……

Recognizing Thousands of Types

It is not an option to use manually labeled training examples.

An alternative is to use naturally annotated data:✦ Hearst patterns: NPtype such as NPinst

✦ Other examples: navigate to NPloc

Multi-round Conversation

Complex intents may not be articulated in one shot✦ FlightSearch(to, from, date, airline, class)

A multi-round conversation incrementally collects information from user and guides the user in the process.

Dialog Management

Composite Intents

Messenger chat with Alex and say let’s meet on saturday✦ OpenMessenger

✦ ChatWithPerson

✦ SendMessage

get a uber black ride to SFO✦ UberRide

✦ SetDest

✦ SelectUberBlack

Messenger Chat

Plan Generation: intent → actions

Grounding: establishes the connection between in the inside (the assistant) and the outside (apps and devices).

Example:✦ intent:

{“task”: “FlightStatus”, “number”:”UA888”, “date”:”2017-11-04”}

✦ action:

select * from flight_db where “airline”=“United Airlines”, flight_num = “888” and year=2017 and month=11 and day=4

Actions on Googlegrounding

What is my data usage?

Teaching a New Skill

Grounding by Crowd Sourcing

context

expression

actions

Skills=

Crowd Sourced Skills

Skills are immediately usable by the creator. ✦ The user may share the skills with others, e.g., tech support

for parents

Vetted skills can be made available to the public

Summary

Voice interaction is inevitable

Naturali Sesami translates user requests into sequences of actions in APPs.

Sesami grows by crowd sourcing skills.

Join US! ✦ [email protected]