Reducing uncertainty in speech recognition Controlling mobile devices through voice activated commands Neil Gow, GWXNEI001 Stephen Breyer-Menke, BRYSTE003

Reducing uncertainty in speech recognition

Controlling mobile devices through voice activated

commands

Neil Gow, GWXNEI001Stephen Breyer-Menke, BRYSTE003

Supervisor: Audrey Mbogho

Introduction• Variety of applications

• Word processing• In-car voice activation• Over-the-phone automated business

systems• Mobile phone interactions• Biometric identification

Introduction• AT&T Bell labs 1936. • Processing power was the initial

barrier• Speeds of up to 160 wpm are

possible• With accuracy of 95%

Introduction• Why use command based

interfaces on cell-phones?• Small keypads• Hands free• No required visual feedback• Quick access to common functions

How it works• Analogue sound waves are

converted to digital format• The acoustical model breaks the

digitized input into phonemes

How it works• Phonemes are analysed in the

context of the phonemes around them

• This is done according to a statistical model to identify the assumed spoken word

Available models• Neural Networks• Dynamic time warping• Knowledge based speech

recognition• The hidden Markov Model

The Toolkits we will be using• The Sphinx Project

• Hidden Markov Model

• The NICO Toolkit• Artificial neural network

Our Problem Domain• Evaluating the two models

performance• Assessing the applicability of the

models in mobile environments

Our Approach• We will be implementing and comparing

two software packages• Scaling the packages for mobile devices• Testing them in a simulated mobile

environment• If feasible we will be implementing the

preferred package on a mobile device

The Sphinx Project• Carnegie Mellon University• funded by DARPA • Open source (GPL)• Latest version written in Java• Based on Hidden Markov Models

The NICO Toolkit• Neural Inference COmputation• Developed during 1993-1997• Open Source (BSD)• Written in C• Written for UNIX• Its focus is for Speech Recognition• General Neural Network Software

Division Of Work• Both

• Designing evaluation criteria• Neil

• Research Hidden Markov Model• Implement and Scale Sphinx• Evaluate Sphinx

• Steve• Research Neural Networks• Implement and Scale NICO• Evaluate NICO

• Both• Mobile implementation

Timeline01

May2007

21May2007

10June2007

30June2007

20July2007

09August 2007

29August 2007

18September2007

08Octob

er2007

28Octob

er2007

Research GeneralProblem

Reseach InduvidualModels

Designing EvaluationCriteria

Implementing SoftwarePackages

Scaling SoftwarePackages

Testing and Evaluation

Mobile Implementation

Deliverables

Start DateCompleted Remaining

Risks• Failure to implement and scale the

packages• Lack of sufficient documentation

for the packages• Failure to understand how they

work• Falling behind schedule

Goals• Further the research on speech

recognition• Determine the effectiveness of

these algorithms in mobile environments

• Produce a working prototype that can be run on mobile devices

Documents

Reducing uncertainty in speech recognition Controlling mobile devices through voice activated commands Neil Gow, GWXNEI001 Stephen Breyer-Menke, BRYSTE003