15
Human – Network Voice Interface in A Wireless Era

Human – Network Voice Interface in A Wireless Era

Embed Size (px)

DESCRIPTION

Human – Network Voice Interface in A Wireless Era. Information–related Activities, Applications and Services in Future Network Era. Future Integrated Networks. Real–time Information weather, traffic flight schedule stock price sports scores. Private Services personal notebook - PowerPoint PPT Presentation

Citation preview

Page 1: Human – Network Voice Interface in A Wireless Era

Human – Network Voice Interface in A Wireless Era

Page 2: Human – Network Voice Interface in A Wireless Era

Information–related Activities, Applications and Services in Future Network Era

• Multi–media, Multi–lingual, Multi–functionalities• Cross–cultures, Cross–domains, Cross–regions• Integrating All Knowledge Systems and Information–related Activities

and Services Globally• Multiple User Terminals

– telephone set, hand set, PDA, vehicular electronics, home appliance, personal computer, etc.

Future Integrated Networks

Real–time Information– weather, traffic– flight schedule– stock price– sports scores

Electronic Commerce– virtual banking– on–line transactions– on–line investments

Knowledge Archieves– digital libraries– virtual museums

Intelligent Working Environment– e–mail processors– intelligent agents– teleconferencing– distant learning

Private Services– personal notebook– business databases– home appliances– network entertainments

Page 3: Human – Network Voice Interface in A Wireless Era

Wireless Access of Global Multi–media Information

• At Any Time, from Anywhere• As Handset Size Shrinks While Required Functionalities Grows and the

User Environment Changes, Voice Interface will be Useful for all User Terminals

• Examples– voice retrieval,voice browser, voice portal, voice web– spoken dialogue based access to intelligent agents

Page 4: Human – Network Voice Interface in A Wireless Era

speech information

speech

Private Services/

Databases/ Applications

Public Services/

Information/Knowledge

InternetInformation Retrieval

textinformation

Text-to-speechSynthesis

Spoken Dialogue

Scenario for Network Information Access

text, image, video, speech, …

Page 5: Human – Network Voice Interface in A Wireless Era

Convergence of PSTN and Internet

handsets

• PSTN(for Voice) and Internet(for Data and Multi-media Contents) are Converging

telephones

PSTN

• Driving Force for the Convergence– “anywhere, any time” of wireless services– voice provides the most convenient and natural interaction interface– attractive contents over the Internet– contents(human information) are why the Internet is attractive, while voice direct

ly carries human information– Speech-enabled Access of Web-based Applications

Internet

PCs

servers

Page 6: Human – Network Voice Interface in A Wireless Era

Voice Interface for Human-network Interaction

– huge volumes of data disseminated across the globe by optical fiber networks

– any time, from anywhere by wireless terminals

– vehicular electronics, PDA, handset, home appliance, etc.

new platforms accessing the global network information/services

– traditional keyboard/mouse not adequate any longer size shrinkage, different user environment, etc.

desired functionalities/human–network interactions increasing

– voice interface will be one out of the few most important, natural, user friendly, attractive interface

– examples: voice retrieval, voice browser, voice portal, voice webvoice–based web–user interaction

voice–based web tools/Application Interfaces, etc.

– voice interface is the only major “missing link” in the “semi–mature” technology chain

Page 7: Human – Network Voice Interface in A Wireless Era

Core Technologies / Functionalities for Voice Interface

Page 8: Human – Network Voice Interface in A Wireless Era

Feature Extraction

unknown speech signal

Pattern Matching

Decision Making

x(t)WX

output wordfeature

vector sequence

Reference Patterns

Feature Extraction

y(t) Y

training speech

Speech Recognition as a pattern recognition problem

Page 9: Human – Network Voice Interface in A Wireless Era

• A Simplified Block Diagram

• Example Input Sentence this is speech• Acoustic Models (th-ih-s-ih-z-s-p-ih-ch)• Lexicon (th-ih-s) → this (ih-z) → is (s-p-iy-ch) → speech• Language Model (this) – (is) – (speech)

P(this) P(is | this) P(speech | this is) P(wi|wi-1) bi-gram language model

P(wi|wi-1,wi-2) tri-gram language model,etc

Basic Approach for Large Vocabulary Speech Recognition

Front-endSignal Processing

AcousticModels Lexicon

FeatureVectors

Linguistic Decoding and

Search Algorithm

Output Sentence

SpeechCorpora

AcousticModel

Training

LanguageModel

Construction

TextCorpora

LexicalKnowledge-base

Language

Model

Input Speech

ICGGrammar

Page 10: Human – Network Voice Interface in A Wireless Era

Speech Recognition Technologies, Applications and Problems

• Word Recognition

– voice command/instructions

• Keyword Spotting

– identifying the keywords out of a pre-defined keyword set from input voice utterances

• Large Vocabulary Continuous Speech Recognition

– entering longer texts

– remote dictation

• Speaker Dependent/Independent/Adaptive

• Acoustic Reception/Background Noise/Channel Distortion

• Read/Spontaneous/Conversational Speech

Page 11: Human – Network Voice Interface in A Wireless Era

Text-to-speech Synthesis

Text Analysis and Letter-to-

sound Conversion

Text Analysis and Letter-to-

sound Conversion

Prosody Generation

Prosody Generation

Signal Processing

and Concatenation

Signal Processing

and Concatenation

Lexicon and Rules

Prosodic Model

Voice Unit Database

Input Text

Output Speech Signal

• Transforming any input text into corresponding speech signals • E-mail/Web page reading • Prosodic modeling • Basic voice units/rule-based, non-uniform units/corpus-based

Page 12: Human – Network Voice Interface in A Wireless Era

Speaker Verification

Feature Extraction

Feature Extraction VerificationVerification

input speech yes/no

• Verifying the speaker as claimed• Applications requiring verification • Text dependent/independent• Integrated with other verification schemes

Speaker Models

Speaker Models

Page 13: Human – Network Voice Interface in A Wireless Era

Information Retrieval Including Voice

• Text Documents/Instructions• Speech Documents/Instructions• Voice Personal Notebook/Private Database

speech instruction

我想找有關新政府組成的新聞?我想找有關新政府組成的新聞?text instruction

d1

text documents

d2

d3d1

d2

d3

speech documents

總統當選人陳水扁今天早上…

Page 14: Human – Network Voice Interface in A Wireless Era

Multi-lingual Functionalities

• Code-Switching Problem– English words/phrases inserted in Spoken Chinese sentences

人人都用 Computers,家家都上 Internet– the whole sentence switched to English

準備好了嗎? Let’s go!

• Cross-language Network Information Processing– globalized network with multi-lingual content/users– cross-language network information processing with spoken Chinese language

input as an example

• Chinese Dialects/Accents– Taiwanese, Cantonese, Shanghainese, etc.– hundreds of Chinese dialects– code-switching problem─dialects mixed with Mandarin(or plus English)– Mandarin with a variety of strong accents

• Language Dependent/Independent Technologies

Page 15: Human – Network Voice Interface in A Wireless Era

Spoken Dialogue Systems

• Almost all human-network interactions can be made by spoken dialogue

• Speech understanding• System/user/mixed initiatives• Reliability/efficiency, dialogue modeling/flow control

Databases

Sentence Generation and Speech Synthesis

Output Speech

Input Speech

DialogueManager

Speech Recognition and Understanding

User’s Intention

Discourse Context

Response to the user

Internet

Networks

Users

Dialogue Server