35
Multimodal Apps: Tablet PC & Speech Development in .NET casey chesnut brains-N-brawn.com Wisconsin .NET June 2005

Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005

Embed Size (px)

Citation preview

Page 1: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005

Multimodal Apps: Tablet PC & Speech Development in .NET

casey chesnutbrains-N-brawn.com

Wisconsin .NET June 2005

Page 2: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005

Source Code

• The associated source can be found here:– http://www.brains-n-brawn.com/artifacts/ugTabletSpeech.zip

Page 3: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005

Seamless Computing

• Advanced Web Services (MVP05)

• Compact Framework (MVP04)

• MapPoint• Tablet PC (MVP03)

• Speech• Artificial Intelligence• Direct3D• Media Center

Page 4: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005

Questions

• How many programmers?– Tablet PC– Speech– Media Center

Page 5: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005

Outline

• Tablet PC

• Speech– Speech API (SAPI)– Speech Application SDK (SASDK)– Speech Server

• Demo– Tablet and Speech– Media Center and Speech

Page 6: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005

Outline : Tablet PC

• Development environment

• How it works

• Working with Ink

• Opinion

• Future

Page 7: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005

Development Environment

• Windows XP Pro (non Tablet edition)• Visual Studio .NET 1.1• Tablet PC SDK 1.7

– http://www.microsoft.com/downloads/details.aspx?familyid=b46d4b83-a821-40bc-aa85-c9ee3d6e9699&displaylang=en

• Recognizer Pack– http://www.microsoft.com/downloads/details.aspx?FamilyId=080

184DD-5E92-4464-B907-10762E9F918B&displaylang=en

• Digitizer Board– http://www.wacom.com/productinfo/index.cfm

• Tablet PC

Page 8: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005

How Ink works

• Digitizer collects stroke information

• Strokes are broken up into characters / words / drawings

• Character / word stroke info is transformed into some feature set

• Feature set is run through some sort of pre-trained AI

• Output is mapped to a dictionary or words

Page 9: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005

Demo

• Digitizer collects stroke information

• Tablet PC Inspector– http://codebetter.com/blogs/peter.van.ooijen/archive/0001/01/01/56161.aspx

Page 10: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005

Demo

• Strokes are broken up into characters / words / drawings

• InkDivider– Tablet PC SDK Sample

Page 11: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005

Demo

• Character / word stroke info is transformed into some feature set

• Feature set is run through some sort of pre-trained AI

• Demo– /aiTabletOcr

• Article– http://www.brains-N-brawn.com/aiTabletOcr/

Page 12: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005

Demo

• Output is mapped to a dictionary or words

• Dictionary Tool– http://blogs.msdn.com/omars/archive/2004/04/15/113597.aspx

• Article– http://www.brains-N-brawn.com/tabletDic/

Page 13: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005

Working with Ink

• InkControls

• InkOverlay– Collection– Recognition

• RealTimeStylus

• Ink on the web

Page 14: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005

Ink Controls

• InkEdit

• InkPicture

• Code from scratch

Page 15: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005

InkOverlay

• Collection

• Recognition

• Demo apps

Page 16: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005

RealTimeStylus

• RealTimeStylusPlugin– Tablet PC SDK Sample

Page 17: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005

Ink on the Web

• IE only

• InkBlogWeb– Tablet PC SDK Sample

• Article– http://www.brains-N-brawn.com/tabletWeb/

Page 18: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005

Opinion

• Green Light– Tablet PC Edition 2005 improved recognition

and usability dramatically– Recognition Pack made development more

accessible– Language Support

• Chinese (Traditional and Simplified),U.S. English, U.K. English, French, German, Italian, Japanese, Korean, Spanish

Page 19: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005

Possible Future

• VS.NET 2005?

• Avalon?

• Will IE7 have tighter integration with ink?

• Longhorn – baked in

• Possiblity for training ink recognition

Page 20: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005

What about Pocket PCs

• Handwriting Recognition

• Form factors

Page 21: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005

Outline : Speech

• How does it work?– Synthesis (TTS)– Recognition (SR)

• Development– Speech API (SAPI)– Speech Application SDK (SASDK)– Speech Server (MSS)

Page 22: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005

How Synthesis Works

• Text is converted to phonemes

• Phonemes are appended together

• Audio is played back

• Demo– /ttSpeech app

• Article– http://www.brains-N-brawn.com/ttSpeech/

Page 23: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005

How Recognition Works

• Audio wav is transformed to some meaningful form

• Phonemes are found in audio signals• Phonemes are mapped to a dictionary or words

• Demo– wavReader app

• Article– http://www.brains-N-brawn.com/noReco/

Page 24: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005

Speech API (SAPI)

• Old school COM

• Windows applications

• Can do dictation

• Demo– SAPI app

Page 25: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005

Opinion

• Yellow light– It works, but is aging– Has to be trained for dictation– Limited language support

• Green light for Tablet PCs– Tablet PC has recognition and synthesis

engines installed– Some Tablets have microphone arrays built in

Page 26: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005

Future

• System.Speech– Simple API– Reflection capabilities– Standards support (SSML, SRGS)– Engines should be improved from all the

Speech Server work

Page 27: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005

What about Pocket PCs

• OEMs can add VoiceCommand

• WindowsMobile has the SAPI API, but no engines

• PlatformBuilder is supposed to have engines

• There are 3rd party engines for purchase

Page 28: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005

Speech Application SDK

• VS.NET 1.1 integration• For web based apps

– Voice-only telephony– Multimodal browser

• Demo– Code voice-only from scratch

• Article– http://www.brains-N-brawn.com/noHands/

Page 29: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005

SASDK

• Speech Synthesis– Inline– Code behind– Prompt functions– Prompt databases

• Speech Recognition– Inline– Static Grammar– Dynamic Grammar– DTMF

Page 30: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005

Speech Server

• Runs SASDK applications• Primarily for Voice-only apps• Also for Multimodal PocketPC apps• Speech Language Packs

– North American Spanish– Canadian French

• Article– http://www.brains-N-brawn.com/speechMulti/

Page 31: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005

Deployment

Page 32: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005

Opinion

• Green light for Voice-Only– Great tool support– Cheap hardware– Language support

• Red light for Multimodal– Standards battle with VoiceXml– IE Speech Add-Ins are not accessible– Pocket IE Speech Add-In not updated for R2

release, nor does it support Smartphone

Page 33: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005

Possible Future

• VS.NET 2005?

• XAML?

• Will IE7 have voice browsing built-in?

• Other browsers to add SALT support?

• Pocket IE Professional?

Page 34: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005

Combo Demos

• Ink and Speech (WinForm)– InkCollection app– http://www.brains-N-brawn.com/tabletStrator/

• Ink and Speech (WebForm)– Video– http://www.brains-N-brawn.com/tabletWeb/

• Remote and Speech (AddIn)– http://www.brains-N-brawn.com/mceSAPI/

• Remote and Speech (HostedHTML)– http://www.brains-N-brawn.com/mceSALT/

Page 35: Multimodal Apps: Tablet PC & Speech Development in.NET casey chesnut brains-N-brawn.com Wisconsin.NET June 2005

Questions