Pham Thang Presentation

Embed Size (px)

Citation preview

  • 8/3/2019 Pham Thang Presentation

    1/14

    Real-Time Speech Recognition

    Thang Pham

    Advisor: Shane Cotter

  • 8/3/2019 Pham Thang Presentation

    2/14

    Background

    Types of speech recognition systems: Word recognition, Connected speech recognition, Speech understanding systems

    Simplest: user-dependent limited vocabulary

    Hard to design any system Variations of speech, i.e.

    amplitude, duration, and signal to noise

    Background noise Reverberation noise.

    Implemented in banking, telephone, etc. IBM ViaVoice

  • 8/3/2019 Pham Thang Presentation

    3/14

    Project Outline

    Design a user-dependent speech recognition system to controlthe movement of a small remote control car

    Limited in vocabulary: Backward, Forward, Left, and Right Trained to my voice

    Different speech recognition algorithms were examined tounderstand the advantages and disadvantages of each system

    Linear Predictive Coding

    Cepstrum Coefficients

    Mel-frequency Cepstrum Coefficients

  • 8/3/2019 Pham Thang Presentation

    4/14

    System Design

    Microphone

    TI 6713 DSP Board

    Sample word at 8 kHz

    Segment word into time frames

    Find Mel-Cepstrum coefficientsfor each frame

    Compare input word to acodebook of defined words using

    dynamic time warping

    Recognizedword

  • 8/3/2019 Pham Thang Presentation

    5/14

    Components List

    Texas Instruments TMS320C6713 DSP Board

    Audio Technica Omnidirectional Microphone

    ATR35S

    Two step motors

  • 8/3/2019 Pham Thang Presentation

    6/14

    Linear Predictive Coding

    Provides a good model of the speech signal.

    Can approximate a speech sample at time n from pastsamples.

    where a1,a2,,ap are coefficients that weight each sample.

    )(...)2()1()( 21 pnsansansans p

  • 8/3/2019 Pham Thang Presentation

    7/14

    Mel-frequency Cepstrum Coefficients

    Research has shown mel-frequency cepstrumcoefficients to be betterthan cepstrum coefficientsand LPC Modeled around human

    auditory system (ear)

    where cn

    is the nth ordermel-frequency cepstrum,and Sk is the power of thekth mel filter.

    12 mel-frequency cepstrumcoefficients characterize

    each time frame

    M

    k MknkSLognC

    1

    ]*)5.0(*cos[*])[(][

  • 8/3/2019 Pham Thang Presentation

    8/14

    Dynamic Time Warping

    Arranged mel-frequency coefficients into vectors

    Use dynamic time warping to find best match

    Compare words that are uttered in a different timeframe. You have a referenced word that you are listening

    for

    You have a sampled word

    Want to compared both words, sampled andreferenced, and see if they match

    Compare mel-frequency cepstrum coefficients foreach frame of speech

  • 8/3/2019 Pham Thang Presentation

    9/14

    Dynamic Time Warping

    Example of DTW:

  • 8/3/2019 Pham Thang Presentation

    10/14

    Dynamic Time Warping

    Solution:

  • 8/3/2019 Pham Thang Presentation

    11/14

    Results

    Word Recognition Rate

    Backward 50 %

    Forward 70 %

    Left 90 %

    Right 40 %

    Sources of error: 1. Noise, i.e. computer fan, fluorescentlight.2. Voice changes, i.e. a word spoken ona day might not sound the same on the

    next day3. Trained to one word template

  • 8/3/2019 Pham Thang Presentation

    12/14

    Problems Encountered

    Warping frequency domaininto mel-frequency, i.e.Log10.

    Translation of MATLAB codeinto C, i.e. dynamic arrays,debugging process

    Dynamic time warping, i.e.theory, algorithm

    7001*2595

    10Hz

    mel

    FLogF

  • 8/3/2019 Pham Thang Presentation

    13/14

    Future Work

    The C implementation of this system is being developed.The implementation will be uploaded onto the TI 6713 DSPBoard once it is completed.

    The code will be modified to allow the recognition systemto operate in real-time.

    A more comprehensive testing of the system will beperformed under a variety of noise conditions.

  • 8/3/2019 Pham Thang Presentation

    14/14

    That is all.

    http://images.google.com/imgres?imgurl=http://programs.chemeketa.edu/theater/hand/hand.jpg&imgrefurl=http://programs.chemeketa.edu/theater/hand/index.html&h=398&w=440&sz=31&hl=en&start=65&tbnid=faiknrgUd9IV8M:&tbnh=115&tbnw=127&prev=/images%3Fq%3Dhave%2Ba%2Bnice%2Bday%26start%3D60%26ndsp%3D20%26svnum%3D10%26hl%3Den%26lr%3D%26sa%3DN