A Framework for Speech Recognition Development

Embed Size (px)

Citation preview

  • 7/30/2019 A Framework for Speech Recognition Development

    1/23

    A Framework For Speech ApplicationDevelopment

    By

    Jason Elroy Martis

    NMAMIT Nitte

    [email protected]

    mailto:[email protected]:[email protected]
  • 7/30/2019 A Framework for Speech Recognition Development

    2/23

    Agenda

    Introduction

    Applications

    Working and Types

    FSM

    Problems

    Proposed Solution

    Results

    Conclusion

    References

  • 7/30/2019 A Framework for Speech Recognition Development

    3/23

    Normal Ways of Interaction

    Normal Interaction actually works in 2 basic forms

    Language

    Meta Language (Body Language)

    Both forms occur simultaneously which makes

    interaction experience richer.

  • 7/30/2019 A Framework for Speech Recognition Development

    4/23

    Language communicated through

    Language is communicated in form of Speech

    What is Speech ???

    Speech is the vocalized form of human

    communication.

    It is based upon the syntactic combination of lexicals

    and names that are drawn from vocabularies.

    It forms to be the most natural way of how we interact

    Example : Hey! How are you?

  • 7/30/2019 A Framework for Speech Recognition Development

    5/23

    Hence Speech Recognition (SR)

    Speech Recognition is the process of converting a speech signal to a

    sequence oflexicals by means of an algorithm.

    i.e

    Instruct something by speech signals and the computer will

    recognize it .

    Is this Necessary???

    Of Course (It improves our natural way of communication withthe electronic or virtual world )

  • 7/30/2019 A Framework for Speech Recognition Development

    6/23

    Application of SR

    There are innumerable applications. Some are

    Military Uses

    Remote Command and Control Centers

    (plane ,Satellite etc)

    Health Care

    Automated medical prescriptionsWOW!!!

    Educational Uses

    Helps teachers and students too

  • 7/30/2019 A Framework for Speech Recognition Development

    7/23

    So how does SR work ??

    A very simple model demonstrates how SR works

  • 7/30/2019 A Framework for Speech Recognition Development

    8/23

    Approaches of SR

    Basically divided into 3

    Acoustic Phonetic Approach (Works on phonemes)

    Pattern Recognition Approach ( Works on Patterns)

    Artificial Intelligence Approach ( Advanced Functionality)

  • 7/30/2019 A Framework for Speech Recognition Development

    9/23

    Acoustic Phonetic Approach

    Need to know phonetics (the Language of Enunciation )

    Recognize Phonemes, convert to lexicals and match to words .

  • 7/30/2019 A Framework for Speech Recognition Development

    10/23

    Pattern Recognition

    Pattern Recognition

    Works in 2 Phases

    Pattern Training

    Comparison

    Pattern Training is modeled by a FSM (Finite State Machines).

    In simple words Speech Templates are created and stored .

    The speakers recognized words and the stored templates arecompared and verified

    If Matched: Accept

    Not Matched :Reject

  • 7/30/2019 A Framework for Speech Recognition Development

    11/23

    Pattern Recognition Contd

    Model:

    Problems: Different accents can cause Problems

  • 7/30/2019 A Framework for Speech Recognition Development

    12/23

    Artificial Intelligence Approach

    This approach overcomes some disadvantages of

    Template based

    Maintains a knowledge baseAutomatically correct words.

    Eg What your name?? (Error!!!)

    It overcomes some problems of Speaker variance andother constraints of Speech

    E.g. Culture, Accent, etc..

  • 7/30/2019 A Framework for Speech Recognition Development

    13/23

    Speech Recognition Model

  • 7/30/2019 A Framework for Speech Recognition Development

    14/23

    Finite State Machines Based SR Model

    It is a very simple approach

    2 main Stages are present

    The Acceptor

    The Transducer

    Acceptor used for accepting of rejecting lexicals

    Transducer is for transition from a set of words to another as i/p

    grows.

  • 7/30/2019 A Framework for Speech Recognition Development

    15/23

    FSM based SR Model Contd

    What if match causes a problem ( 2 words are same )

    Know and no both sound same(How to overcome this problem ??)

    Solution :We can attach weights to them to improverecognition (This can work better )

  • 7/30/2019 A Framework for Speech Recognition Development

    16/23

    Performance of Speech based Systems

    The performance of Speech works on 2 main basis

    WER (Word Error rate)

    WRR (Word Recognition Rate)

    WER is simple indicating how the word is recognized

    WRR is Word recognition Rate

  • 7/30/2019 A Framework for Speech Recognition Development

    17/23

    So What is New in this ???

    Theres Nothing new in this as speech recognition is developed from

    almost nothing to everything now

    All are attracted and developing lots of apps on it

    This causes an integrity issue

    All apps are from scratch

    There can be App Conflicts (2 diff apps on same comp)

    Both apps are waiting for the same word and cause conflicts on

    same machine

    License on these machines (normal developer has to do nothing

    but sit silently until SDK comes) Yuck !!!

  • 7/30/2019 A Framework for Speech Recognition Development

    18/23

    How can we Solve this

    We Combine both of this Approaches

    Allow developers to build from scratch (This makes them

    independent)

    Allow a Platform where they can work together

    So,

    Why not build a framework where users can build things easily

    and plus from scratch We dont loose anything and we improve integrity issues

  • 7/30/2019 A Framework for Speech Recognition Development

    19/23

    How does this Framework Look ???

    Notice how integrity issue is resolved and apps are developed easily

  • 7/30/2019 A Framework for Speech Recognition Development

    20/23

    Results

    Notice how the results affect the accuracy

    Type of Speech Accuracy

    Normal Dictionary Speech 50-90%

    Choices (Customized) 90%

    Choices (General ) 80%

    Individual Letters 30%

    Customized Phonetics 70%

  • 7/30/2019 A Framework for Speech Recognition Development

    21/23

    Conclusion

    Speech is a natural way of Communication.

    Numerous applications of Speech are present.

    There are various approaches and they have their own Pros and Cons

    FSMs are one way to make job easier and better

    There are lots of problems

    Recognition problems

    Integrity issues

    So , We need a platform independent framework that can solve these

    issues and make the life of speech developers easier.

  • 7/30/2019 A Framework for Speech Recognition Development

    22/23

    References[1] Wienstien C.J. Military and government applications of human-machine communication by voice. In

    Proceedings of the Natl. Acad. Sci. USA. Volume 92 1001110016. October 1995.

    [2].Dat Tat Tran, Fuzzy Approaches to Speech and Speaker Recognition, A thesis submitted for the degree of

    Doctor of Philosophy of the university of Canberra.

    [3] R.K.Moore, Twenty things we still don t know about speech, Proc.CRIM/ FORWISS Workshop on Progress

    and Prospects of speech Research an Technology , 1994.

    [4].Sadaoki Furui, 50 years of Progress in speech and Speaker Recognition Research, ECTI Transactions on

    Computer and Information Technology, Vol.1. No.2 November 2005.[5]. Willie Walker .etal. Sphinx-4: A Flexible Open Source Framework for Speech Recognition

    http://cmusphinx.sourceforge.net/sphinx4

    [6] M.A.Anusuya, Speech Recognition by Machine: A Review. In (IJCSIS) International Journal of Computer

    Science and Information Security, Vol. 6, No. 3, 2009

    http://arxiv.org/ftp/arxiv/papers/1001/1001.2267.pdf

    [7] Neann Mathai, A Literature Survey of Speech Recognition and Hidden Markov Models.

    http://shenzi.cs.uct.ac.za/~honsproj/cgi-bin/view/2009/katz_mathai_sobey.zip/Speech_Katz_Mathai_Sobey/Downloads/NeannMathaiLiteratureSu

    rvey.pdf

    [8] Pavel Stemberk, Speech recognition based on FSM and HTK toolkits

    http://stembep.wz.cz/!papers/Zilina-dt04/zildt04.pdf

    [9] Steve Renals, Speech recognition.

    http://dsp-book.narod.ru/rec-notes.pdf

    http://shenzi.cs.uct.ac.za/~honsproj/cgi-bin/view/2009/katz_mathai_sobey.zip/Speech_Katz_Mathai_Sobey/Downloads/NeannMathaiLiteratureSurvey.pdfhttp://shenzi.cs.uct.ac.za/~honsproj/cgi-bin/view/2009/katz_mathai_sobey.zip/Speech_Katz_Mathai_Sobey/Downloads/NeannMathaiLiteratureSurvey.pdfhttp://shenzi.cs.uct.ac.za/~honsproj/cgi-bin/view/2009/katz_mathai_sobey.zip/Speech_Katz_Mathai_Sobey/Downloads/NeannMathaiLiteratureSurvey.pdfhttp://shenzi.cs.uct.ac.za/~honsproj/cgi-bin/view/2009/katz_mathai_sobey.zip/Speech_Katz_Mathai_Sobey/Downloads/NeannMathaiLiteratureSurvey.pdfhttp://shenzi.cs.uct.ac.za/~honsproj/cgi-bin/view/2009/katz_mathai_sobey.zip/Speech_Katz_Mathai_Sobey/Downloads/NeannMathaiLiteratureSurvey.pdfhttp://shenzi.cs.uct.ac.za/~honsproj/cgi-bin/view/2009/katz_mathai_sobey.zip/Speech_Katz_Mathai_Sobey/Downloads/NeannMathaiLiteratureSurvey.pdfhttp://shenzi.cs.uct.ac.za/~honsproj/cgi-bin/view/2009/katz_mathai_sobey.zip/Speech_Katz_Mathai_Sobey/Downloads/NeannMathaiLiteratureSurvey.pdf
  • 7/30/2019 A Framework for Speech Recognition Development

    23/23

    http://www.animationfactory.com/