CVSR COLLEGE OF ENGINEERINGCVSR COLLEGE OF ENGINEERING
DEPARTMENT OFDEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERINGELECTRONICS AND COMMUNICATION ENGINEERING
TECHNICAL SEMINAR ONTECHNICAL SEMINAR ON
SPEECH TO TEXT CONVERSIONSPEECH TO TEXT CONVERSION
BY
Y.RAJENDER REDDY(08H61A04C5)
INTRODUCTIONINTRODUCTION
Speech recognition is the process of capturing spoken words using
microphone or telephone and converting them into a digitally stored set of
words
Speech to text conversion is one of the application of speech recognition
Speech-to-text system improves accessibility by providing data entry
options for blind, deaf, or physically handicapped users.
BLOCK DIAGRAMBLOCK DIAGRAM
Speech acquisition
Speech preprocessing
Hidden Marcov model
Text storage
External Hardware
Through Microphone
SPEECH ACQUISITIONSPEECH ACQUISITION
The microphone input port with the audio codec receives the signal, amplifies
it, and converts it into 16-bit PCM digital samples at a sampling rate of 8 KHz
.
The system needs a parallel/serial interface to the Nios II processor and an
application running on the processor that acquires and stores data in memory.
The received samples are stored into memory on the Altera Development and
Education (DE2) board.
SPEECH PREPROCESSINGSPEECH PREPROCESSING
Preprocessing involves taking the speech samples as input, blocking the
samples into frames, and returning a unique pattern for each sample.
The unique pattern can be achived by following steps
1. The digital samples are divided into overlapped frames.
2. The system checks the frames for voice activity using endpoint detection
and energy threshold calculations.
3. The speech samples are passed through a pre-emphasis filter.
4.The frames with voice activity are passed through a Hamming window.
CONTINUE……CONTINUE……
5. The system finds linear predictive coding (LPC) coefficients for frames .
6. From the LPC coefficients, the system determines the cepstral coefficients
The cepstral coefficients serve as feature vectors.
HIDDEN MARCOV HIDDEN MARCOV MODELMODEL
Hidden Marcov Model is used for speech recognition, which converts speech
to text
This model consists of three steps
• Training
• HMM-Based recognition
• Digit Models
TRAININGTRAINING
Training involves creating a pattern representative of the features of a class
using one or more test patterns that correspond to speech sounds of the same
class.
An important part of speech-to-text conversion using pattern recognition
is training.
HMM-BASED RECOGNITIONHMM-BASED RECOGNITION
Recognition is the process of comparing the unknown test pattern
with each sound class reference pattern and computing a measure of
similarity between the test pattern and each reference pattern
DIGIT MODELSDIGIT MODELS
The input speech sample is preprocessed and the feature
vector is extracted.
Then, the index of nearest codebook vector for each frame
is sent to all digit models.
The model with the maximum probability is chosen as the
recognisied digit.
TEXT STORAGETEXT STORAGE
The Nios II processor on the DE2 board sends the digital speech data to a PC.
A target program running on the PC receives the text and writes it to the
disk.
APPLICATIONAPPLICATIONSSInteractive voice response system (IVRS)
Voice-dialing in mobile phones and telephones
Hands-free dialing in wireless bluetooth headsets
PIN and numeric password entry modules
Automated teller machines (ATMs)
1. Topic taken from seminartopics.co.in/ece-seminar-topics/
2. Garg, Mohit. Linear Prediction Algorithms. Indian Institute of Technology,
Bombay, India, Apr 2003.
3. Li, Gongjun and Taiyi Huang. An Improved Training Algorithm in Hmm-
Based Speech Recognition.National Laboratory of Pattern Recognition.
Chinese Academy of Sciences, Beijing.
4. Altera Nios ii Document
REFERENCESREFERENCES
THANK THANK
YOUYOU
QUERIES
QUERIES
Recommended