Real-Time Speech Recognition

Real-Time Speech Recognition

Thang PhamAdvisor: Shane Cotter

Background

Types of speech recognition systems: Word recognition, Connected speech recognition, Speech understanding systems

Simplest: user-dependent limited vocabulary

Hard to design any system Variations of speech, i.e.

amplitude, duration, and signal to noise

Background noise Reverberation noise.

Implemented in banking, telephone, etc. IBM ViaVoice

Project Outline

Design a user-dependent speech recognition system to control the movement of a small remote control car

Limited in vocabulary: Backward, Forward, Left, and Right Trained to my voice

Different speech recognition algorithms were examined to understand the advantages and disadvantages of each system

Linear Predictive Coding Cepstrum Coefficients Mel-frequency Cepstrum Coefficients

System Design

Microphone

TI 6713 DSP Board

Sample word at 8 kHz

Segment word into time frames

Find Mel-Cepstrum coefficients for each frame

Compare input word to a codebook of defined words using

dynamic time warping

Recognized word

Components List

Texas Instruments TMS320C6713 DSP Board

Audio Technica Omnidirectional Microphone ATR35S

Two step motors

Linear Predictive Coding

Provides a good model of the speech signal. Can approximate a speech sample at time n from past

samples.

where a1,a2,…,ap are coefficients that weight each sample.

)(...)2()1()( 21 pnsansansans p

Mel-frequency Cepstrum Coefficients

Research has shown mel-frequency cepstrum coefficients to be better than cepstrum coefficients and LPC

Modeled around human auditory system (ear)

where cn is the nth order mel-frequency cepstrum, and Sk is the power of the kth mel filter.

12 mel-frequency cepstrum coefficients characterize each time frame

M

k MknkSLognC

1

]*)5.0(*cos[*])[(][

Dynamic Time Warping

Arranged mel-frequency coefficients into vectors

Use dynamic time warping to find best match

Compare words that are uttered in a different time frame.

You have a referenced word that you are listening for

You have a sampled word

Want to compared both words, sampled and referenced, and see if they match

Compare mel-frequency cepstrum coefficients for each frame of speech


Example of DTW:


Solution:

Results

Word Recognition Rate

Backward 50 %

Forward 70 %

Left 90 %

Right 40 %

Sources of error: 1. Noise, i.e. computer fan, fluorescent light.2. Voice changes, i.e. a word spoken on a day might not sound the same on thenext day3. Trained to one word template

Problems Encountered

Warping frequency domain into mel-frequency, i.e. Log10.

Translation of MATLAB code into C, i.e. dynamic arrays, debugging process

Dynamic time warping, i.e. theory, algorithm

7001*2595 10

Hzmel

FLogF

Future Work

• The C implementation of this system is being developed. The implementation will be uploaded onto the TI 6713 DSP Board once it is completed.

• The code will be modified to allow the recognition system to operate in real-time.

• A more comprehensive testing of the system will be performed under a variety of noise conditions.

That is all.

Documents

Real-Time Speech Recognition