Upload
tybalt
View
35
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Real-Time Speech Recognition. Thang Pham Advisor: Shane Cotter. Background. Types of speech recognition systems: Word recognition, Connected speech recognition, Speech understanding systems Simplest: user-dependent limited vocabulary Hard to design any system - PowerPoint PPT Presentation
Citation preview
Real-Time Speech Recognition
Thang PhamAdvisor: Shane Cotter
Background
Types of speech recognition systems: Word recognition, Connected speech recognition, Speech understanding systems
Simplest: user-dependent limited vocabulary
Hard to design any system Variations of speech, i.e.
amplitude, duration, and signal to noise
Background noise Reverberation noise.
Implemented in banking, telephone, etc. IBM ViaVoice
Project Outline
Design a user-dependent speech recognition system to control the movement of a small remote control car
Limited in vocabulary: Backward, Forward, Left, and Right Trained to my voice
Different speech recognition algorithms were examined to understand the advantages and disadvantages of each system
Linear Predictive Coding Cepstrum Coefficients Mel-frequency Cepstrum Coefficients
System Design
Microphone
TI 6713 DSP Board
Sample word at 8 kHz
Segment word into time frames
Find Mel-Cepstrum coefficients for each frame
Compare input word to a codebook of defined words using
dynamic time warping
Recognized word
Components List
Texas Instruments TMS320C6713 DSP Board
Audio Technica Omnidirectional Microphone ATR35S
Two step motors
Linear Predictive Coding
Provides a good model of the speech signal. Can approximate a speech sample at time n from past
samples.
where a1,a2,…,ap are coefficients that weight each sample.
)(...)2()1()( 21 pnsansansans p
Mel-frequency Cepstrum Coefficients
Research has shown mel-frequency cepstrum coefficients to be better than cepstrum coefficients and LPC
Modeled around human auditory system (ear)
where cn is the nth order mel-frequency cepstrum, and Sk is the power of the kth mel filter.
12 mel-frequency cepstrum coefficients characterize each time frame
M
k MknkSLognC
1
]*)5.0(*cos[*])[(][
Dynamic Time Warping
Arranged mel-frequency coefficients into vectors
Use dynamic time warping to find best match
Compare words that are uttered in a different time frame.
You have a referenced word that you are listening for
You have a sampled word
Want to compared both words, sampled and referenced, and see if they match
Compare mel-frequency cepstrum coefficients for each frame of speech
Dynamic Time Warping
Example of DTW:
Dynamic Time Warping
Solution:
Results
Word Recognition Rate
Backward 50 %
Forward 70 %
Left 90 %
Right 40 %
Sources of error: 1. Noise, i.e. computer fan, fluorescent light.2. Voice changes, i.e. a word spoken on a day might not sound the same on thenext day3. Trained to one word template
Problems Encountered
Warping frequency domain into mel-frequency, i.e. Log10.
Translation of MATLAB code into C, i.e. dynamic arrays, debugging process
Dynamic time warping, i.e. theory, algorithm
7001*2595 10
Hzmel
FLogF
Future Work
• The C implementation of this system is being developed. The implementation will be uploaded onto the TI 6713 DSP Board once it is completed.
• The code will be modified to allow the recognition system to operate in real-time.
• A more comprehensive testing of the system will be performed under a variety of noise conditions.
That is all.