24

Speech enhanced gesture based navigation for Google Maps

Embed Size (px)

Citation preview

Page 1: Speech enhanced gesture based navigation for Google Maps
Page 2: Speech enhanced gesture based navigation for Google Maps

Speech Enhanced Gesture Based Navigation System for Google Maps An exploration in Multimodal HCI

Under the Guidance of: Asst. Professor Manoj MajhiVikas Luthra | Himanshu Bansal | Maulishree Pandey

Page 3: Speech enhanced gesture based navigation for Google Maps

Goal of Our Journey

Abstract

• Conventional method of using different features of Google Maps on touch-based devices entails use of touch-based gestures defined for the devices.

• For certain touch-based devices like public kiosks, touch-screens, etc, it is possible to define in-air or 3D gestures.

• Coupled with basic speech commands, a new group of interactions can be prepared for accessing Google Maps.

• However, it becomes important to measure the usability of this new group of gestures against the conventional touch-based gestures before substation is considered.

Page 4: Speech enhanced gesture based navigation for Google Maps

Final Destination: Aim

• Define the gestures and speech commons for the features of Google maps, and evaluate them against the existing interactions

Page 5: Speech enhanced gesture based navigation for Google Maps

Final Destination: Aim

• Define the gestures and speech commons for the features of Google maps, and evaluate them against the existing interactions

• Compare and evaluate usability of 3D gestures as well as speech against touch-based gestures for using Google Maps for a large touchscreen

Page 6: Speech enhanced gesture based navigation for Google Maps

The Route to follow for our Journey: Methodology

Literature Research (Aug 1st week – Sept 1st week)Background of the technologiesMultimodal HCI theorySimilar Works

Page 7: Speech enhanced gesture based navigation for Google Maps

The Route to follow for our Journey: Methodology

Literature Research (Aug 1st week – Sept 1st week)Background of the technologiesMultimodal HCI theorySimilar Works

System Definition and Design (Sept 2nd week –Oct 1st week)To decide case-study features of Google mapsUse-case scenariosFeature wise gesture definitionAddition of voice commands where gesture control is not applicable

Page 8: Speech enhanced gesture based navigation for Google Maps

The Route to follow for our Journey: Methodology

Prototype Development (Oct 2nd week-Nov 4th week)Skelton Based Gesture Tracking System DevelopmentSpeech Recognition System DevelopmentDebugging and Refinement

Page 9: Speech enhanced gesture based navigation for Google Maps

The Route to follow for our Journey: Methodology

Prototype Development (Oct 2nd week-Nov 4th week)Skelton Based Gesture Tracking System DevelopmentSpeech Recognition System DevelopmentDebugging and Refinement

Comparative Study (Next Semester)Experiments on comparison between 2 solutions having different gestures and voice commandsStatistical analysis

Page 10: Speech enhanced gesture based navigation for Google Maps

The Route to follow for our Journey: Methodology

Prototype Development (Oct 2nd week-Nov 4th week)Skelton Based Gesture Tracking System DevelopmentSpeech Recognition System DevelopmentDebugging and Refinement

Comparative Study (Next Semester)Experiments on comparison between 2 solutions having different gestures and voice commandsStatistical analysis

Conclusion (Next Semester)Inferences and Guidelines

Page 11: Speech enhanced gesture based navigation for Google Maps

Mode of Transportation : Microsoft Kinect

Page 12: Speech enhanced gesture based navigation for Google Maps

Mode of Transportation : Microsoft Kinect

Page 13: Speech enhanced gesture based navigation for Google Maps

Mode of Transportation : Microsoft Kinect

Microsoft Kinect  • Kinect sensor can build a 'depth map' of the area in front of it. • This depth map is used to recognize the distance of various objects in front of the

kinect.• One of the popular uses is recognizing and tracking people standing in front of the

sensor. • Kinect has four microphones to pick up audio

Page 14: Speech enhanced gesture based navigation for Google Maps

Mode of Transportation : Microsoft Kinect

Kinect for Windows SDK

• This SDK has been provided by Microsoft for free use and experimentation, without the permission of commercial distribution. SDK contains APIs that allow tracking of people in front of the Kinect and provide coordinates of different bodily joints.

• There are APIs that recognize basic and common hand gestures like grip, release, etc.

• Speech APIs are provided to capture sound and program them for use.

Page 15: Speech enhanced gesture based navigation for Google Maps

Mode of Transportation : Microsoft Kinect

Kinect for Windows SDK

• This SDK has been provided by Microsoft for free use and experimentation, without the permission of commercial distribution. SDK contains APIs that allow tracking of people in front of the Kinect and provide coordinates of different bodily joints.

• There are APIs that recognize basic and common hand gestures like grip, release, etc.

• Speech APIs are provided to capture sound and program them for use. “We would be using Kinect for Windows SDK and Kinect for XBox 360 to design gestures and recognition of certain speech commands. Development would occur in Microsoft Visual Studio 2010, using C# programming language.”

Page 16: Speech enhanced gesture based navigation for Google Maps

Mode of Transportation : Speech Recognition

 What is needed

1. Acoustic Modelprobabilistic models which makes try to build connection between voice utterances and its transcriptions present in training data

Page 17: Speech enhanced gesture based navigation for Google Maps

Mode of Transportation : Speech Recognition

 What is needed

1. Acoustic Modelprobabilistic models which makes try to build connection between voice utterances and its transcriptions present in training data

2. Language Model#monogram, #bigram, #trigramnot much in our case

Page 18: Speech enhanced gesture based navigation for Google Maps

Mode of Transportation : Speech Recognition

 What is needed

1. Acoustic Modelprobabilistic models which makes try to build connection between voice utterances and its transcriptions present in training data

2. Language Model#monogram, #bigram, #trigramnot much in our case

3. Mapping Dictionarygrapheme to phoneme

Page 19: Speech enhanced gesture based navigation for Google Maps

Mode of Transportation : Speech Recognition

 Current Challenges

1. Large variability in accents

2. Variability in gender

3. Surrounding noise

4. So many names of cities and places

Page 20: Speech enhanced gesture based navigation for Google Maps

Mode of Transportation : Speech Recognition

 

Development Tools

1. Microsoft speech SDK 5.1Preferable to work Microsoft Kinect

Page 21: Speech enhanced gesture based navigation for Google Maps

Mode of Transportation : Speech Recognition

 

Development Tools

1. Microsoft speech SDK 5.1Preferable to work Microsoft Kinect

2. CMU sphinx 0.8Open Source Toolkit For Speech Recognition

Page 22: Speech enhanced gesture based navigation for Google Maps

Mode of Transportation : Speech Recognition

 

Development Tools

1. Microsoft speech SDK 5.1Preferable to work Microsoft Kinect

2. CMU sphinx 0.8Open Source Toolkit For Speech Recognition

3. Dragon SDKs - Nuance

Page 23: Speech enhanced gesture based navigation for Google Maps

Discussions & Conclusion

 1. Speech input is about 4 times faster than typing

2. Touch interaction on vertical screen can cause Gorilla Arm effect

3. Free hand gesture has been used previously also for navigation systems

4. Assumption of improved ease of use by integration these two modalities

5. Need to have training corpus for Indian accent users for ASR system

6. Need to define variables

Page 24: Speech enhanced gesture based navigation for Google Maps

Thank You for Listening Picture abhi baaki hai mere dost (our journey still continues)……