Upload
sunawar-khan-ahsan
View
80
Download
0
Tags:
Embed Size (px)
Citation preview
Types of ASR?????
Approaches to ASR
ASR(Automatic Speech Recognition)?
What is Voice Recognition???
What Is Voice??
Process of Voice Recognition????
Why Voices are Different???
Component of Sound???
How Speech Recognition Works????
Application of Speech Processing??
Process of Speech Production???
Classification to Speech Sounds??
Approaches to Speech Recognition??
The voice consists of sound made by a
human being using the vocal folds for
talking, singing, laughing, crying,
screaming, etc.
The voice consists of sound made by a
human being using the vocal folds for
talking, singing, laughing, crying,
screaming, etc.
It is the process of converting voice into
electric signals.
Signals transform into CODING
PATTERN.
The first ASR device was used in 1952and recognized single digits spoken by a
user
TEMPLATE MATCHING
Template matching is
the simplest technique
and has the highest
accuracy when used
properly, but it also
suffers from the most
limitations.
ASR
Feature Analysis
A more general form
of voice recognition is
available through
feature analysis and
this technique usually
leads to "speaker-
independent" voice
recognition.
•It is SPEAKE DEPENDENT.
•It match voice with already saved
templates.
•Before it we’ve to trained the system.
• System must be trained.
•User speak same word which are avail
in template.
•Recognition accuracy can be about 98
percent.
Template Matching
•It is SPEAKER INDEPENDENT.
•First process the giving voice as inputut
•Using LPC(Linear Productive Coding)
•Attempt to find similarities b/w
expected
•Input and Digitized input.
•Recognition accuracy for
speaker-independent systems is
somewhat less than for
speaker-dependent systems, usually
between 90 and 95 percent.
Feature Analysis
TEXT PhonemsArticulary
Motions
Speak/
Say Someting
Acoustic Wave Form
Acoustic Wave FormSpectrum
Analysis
Feature
Retractions
Coding
Phonems/
Word/Sentence
Semantics
Discrete Input Continuous Input
Vocal Tract
Consist of laryngeal pharynx, oral
phyrnax, oral cavity, nassal cavity,
nassal phyrnx.
Specturm Analysis
MFCC used to produce voice
feaature. DTW to select the pattern
that match the database(matLab).
Acoustic Model
provide the acoustic sound of a language
and can be recognized the chore of a
particular user speech pattern and
acoustic environment.
To make pattern recognition PCM
transfer into frequency domain
Speaker Dependent
Speaker Independent
Discrete Speaker Recognition
Continuous Speech Recognition
Natural Languages
Pitch
Timber
Harmonics
Loudness
Rhythm
Attack
Sustain
Decay
Speed
COMPRESSION
in which particles are crowded
together, appear as upward curves in
the line.
RAREFACTION
in which particles are spread apart,
appear as downward curves in the line.
WAVELENGTH
this is the distance from the crest of one
wave to the crest of the next.
FREQUENCY
this is the number of waves that
pass a point in each second.
AMPLITUDE
this is the measure of the amount
of energy in a sound wave.
High Frequency Sound Wave Low Frequency Sound Wave
This is how high or low a sound seems.
A bird makes a high pitch.
A lion makes a low pitch.
Voices are different caused
by
INTENSITY(depend on amplitude) ,
PITCH(frequency) ,
TONE(pleasant or unpleasent).
Divide the sound wave into evenly spaced blocks
Process each block for important characteristics, such as strength across various frequency ranges, number of zero crossings, and total energy.
Using this characteristic vector, attempt to associate each block with a phone, which is the most basic unit of speech, producing a string of phones.
Find the word whose model is the most likely match to the string of phones which was produced.
Transfer the PCM into Accoustic
Apply GRAMMER
Figure out which PHONEMS are spoken
Convert PHONEMS into WORDS
Acoustic Phonetic Approach
Pattern Recognition Approach(HMM)
Artificial Intelligence Approach(Neural Networks)
Speech Processing
Analysis/Syntactic Coding
Recognition
Speaker Recognition Language Identification
Speech Recognition
Speech Mode Speaking StyleVocabulary SizeSpeaker Mode
•Isolated Speech
•Continuous Speech
•Speaker Dependent
•Speaker In-Dependent
•Speaker Adaptive
•Small
•Medium
•large
•Dictation
•Spontaneous
•Vocal Chord play active role in the
production of SOUND.
e.g. a/e/I
•It has high frequency
Voiced Sound
•When Vocal Chord is Inactive
Called UN VOICED SOUND
e.g. s/f
•It build up by pressure
Un Voiced Sound
Speech Coding
Speech Recognition
Speech Verification/Identification
Speech Enhancement(remove background noises)
Speech Synthesis
Grammar Design
Signal Processing
Phonemic Recognition
Word Recognition
Result Recognition