16
Reducing uncertainty in speech recognition Controlling mobile devices through voice activated commands Neil Gow, GWXNEI001 Stephen Breyer-Menke, BRYSTE003 Supervisor: Audrey Mbogho

Reducing uncertainty in speech recognition Controlling mobile devices through voice activated commands Neil Gow, GWXNEI001 Stephen Breyer-Menke, BRYSTE003

Embed Size (px)

DESCRIPTION

Introduction AT&T Bell labs Processing power was the initial barrier Speeds of up to 160 wpm are possible With accuracy of 95%

Citation preview

Page 1: Reducing uncertainty in speech recognition Controlling mobile devices through voice activated commands Neil Gow, GWXNEI001 Stephen Breyer-Menke, BRYSTE003

Reducing uncertainty in speech recognition

Controlling mobile devices through voice activated

commands

Neil Gow, GWXNEI001Stephen Breyer-Menke, BRYSTE003

Supervisor: Audrey Mbogho

Page 2: Reducing uncertainty in speech recognition Controlling mobile devices through voice activated commands Neil Gow, GWXNEI001 Stephen Breyer-Menke, BRYSTE003

Introduction• Variety of applications

• Word processing• In-car voice activation• Over-the-phone automated business

systems• Mobile phone interactions• Biometric identification

Page 3: Reducing uncertainty in speech recognition Controlling mobile devices through voice activated commands Neil Gow, GWXNEI001 Stephen Breyer-Menke, BRYSTE003

Introduction• AT&T Bell labs 1936. • Processing power was the initial

barrier• Speeds of up to 160 wpm are

possible• With accuracy of 95%

Page 4: Reducing uncertainty in speech recognition Controlling mobile devices through voice activated commands Neil Gow, GWXNEI001 Stephen Breyer-Menke, BRYSTE003

Introduction• Why use command based

interfaces on cell-phones?• Small keypads• Hands free• No required visual feedback• Quick access to common functions

Page 5: Reducing uncertainty in speech recognition Controlling mobile devices through voice activated commands Neil Gow, GWXNEI001 Stephen Breyer-Menke, BRYSTE003

How it works• Analogue sound waves are

converted to digital format• The acoustical model breaks the

digitized input into phonemes

Page 6: Reducing uncertainty in speech recognition Controlling mobile devices through voice activated commands Neil Gow, GWXNEI001 Stephen Breyer-Menke, BRYSTE003

How it works• Phonemes are analysed in the

context of the phonemes around them

• This is done according to a statistical model to identify the assumed spoken word

Page 7: Reducing uncertainty in speech recognition Controlling mobile devices through voice activated commands Neil Gow, GWXNEI001 Stephen Breyer-Menke, BRYSTE003

Available models• Neural Networks• Dynamic time warping• Knowledge based speech

recognition• The hidden Markov Model

Page 8: Reducing uncertainty in speech recognition Controlling mobile devices through voice activated commands Neil Gow, GWXNEI001 Stephen Breyer-Menke, BRYSTE003

The Toolkits we will be using• The Sphinx Project

• Hidden Markov Model

• The NICO Toolkit• Artificial neural network

Page 9: Reducing uncertainty in speech recognition Controlling mobile devices through voice activated commands Neil Gow, GWXNEI001 Stephen Breyer-Menke, BRYSTE003

Our Problem Domain• Evaluating the two models

performance• Assessing the applicability of the

models in mobile environments

Page 10: Reducing uncertainty in speech recognition Controlling mobile devices through voice activated commands Neil Gow, GWXNEI001 Stephen Breyer-Menke, BRYSTE003

Our Approach• We will be implementing and comparing

two software packages• Scaling the packages for mobile devices• Testing them in a simulated mobile

environment• If feasible we will be implementing the

preferred package on a mobile device

Page 11: Reducing uncertainty in speech recognition Controlling mobile devices through voice activated commands Neil Gow, GWXNEI001 Stephen Breyer-Menke, BRYSTE003

The Sphinx Project• Carnegie Mellon University• funded by DARPA • Open source (GPL)• Latest version written in Java• Based on Hidden Markov Models

Page 12: Reducing uncertainty in speech recognition Controlling mobile devices through voice activated commands Neil Gow, GWXNEI001 Stephen Breyer-Menke, BRYSTE003

The NICO Toolkit• Neural Inference COmputation• Developed during 1993-1997• Open Source (BSD)• Written in C• Written for UNIX• Its focus is for Speech Recognition• General Neural Network Software

Page 13: Reducing uncertainty in speech recognition Controlling mobile devices through voice activated commands Neil Gow, GWXNEI001 Stephen Breyer-Menke, BRYSTE003

Division Of Work• Both

• Designing evaluation criteria• Neil

• Research Hidden Markov Model• Implement and Scale Sphinx• Evaluate Sphinx

• Steve• Research Neural Networks• Implement and Scale NICO• Evaluate NICO

• Both• Mobile implementation

Page 14: Reducing uncertainty in speech recognition Controlling mobile devices through voice activated commands Neil Gow, GWXNEI001 Stephen Breyer-Menke, BRYSTE003

Timeline01

May2007

21May2007

10June2007

30June2007

20July2007

09August 2007

29August 2007

18September2007

08Octob

er2007

28Octob

er2007

Research GeneralProblem

Reseach InduvidualModels

Designing EvaluationCriteria

Implementing SoftwarePackages

Scaling SoftwarePackages

Testing and Evaluation

Mobile Implementation

Deliverables

Start DateCompleted Remaining

Page 15: Reducing uncertainty in speech recognition Controlling mobile devices through voice activated commands Neil Gow, GWXNEI001 Stephen Breyer-Menke, BRYSTE003

Risks• Failure to implement and scale the

packages• Lack of sufficient documentation

for the packages• Failure to understand how they

work• Falling behind schedule

Page 16: Reducing uncertainty in speech recognition Controlling mobile devices through voice activated commands Neil Gow, GWXNEI001 Stephen Breyer-Menke, BRYSTE003

Goals• Further the research on speech

recognition• Determine the effectiveness of

these algorithms in mobile environments

• Produce a working prototype that can be run on mobile devices