Automatic Speech Recognion

Types of ASR?????

Approaches to ASR

ASR(Automatic Speech Recognition)?

What is Voice Recognition???

What Is Voice??

Process of Voice Recognition????

Why Voices are Different???

Component of Sound???

How Speech Recognition Works????

Application of Speech Processing??

Process of Speech Production???

Classification to Speech Sounds??

Approaches to Speech Recognition??

The voice consists of sound made by a

human being using the vocal folds for

talking, singing, laughing, crying,

screaming, etc.

The voice consists of sound made by a

human being using the vocal folds for

talking, singing, laughing, crying,

screaming, etc.

It is the process of converting voice into

electric signals.

Signals transform into CODING

PATTERN.

The first ASR device was used in 1952and recognized single digits spoken by a

user

TEMPLATE MATCHING

Template matching is

the simplest technique

and has the highest

accuracy when used

properly, but it also

suffers from the most

limitations.

ASR

Feature Analysis

A more general form

of voice recognition is

available through

feature analysis and

this technique usually

leads to "speaker-

independent" voice

recognition.

•It is SPEAKE DEPENDENT.

•It match voice with already saved

templates.

•Before it we’ve to trained the system.

• System must be trained.

•User speak same word which are avail

in template.

•Recognition accuracy can be about 98

percent.

Template Matching

•It is SPEAKER INDEPENDENT.

•First process the giving voice as inputut

•Using LPC(Linear Productive Coding)

•Attempt to find similarities b/w

expected

•Input and Digitized input.

•Recognition accuracy for

speaker-independent systems is

somewhat less than for

speaker-dependent systems, usually

between 90 and 95 percent.

Feature Analysis

TEXT PhonemsArticulary

Motions

Speak/

Say Someting

Acoustic Wave Form

Acoustic Wave FormSpectrum

Analysis

Feature

Retractions

Coding

Phonems/

Word/Sentence

Semantics

Discrete Input Continuous Input

Vocal Tract

Consist of laryngeal pharynx, oral

phyrnax, oral cavity, nassal cavity,

nassal phyrnx.

Specturm Analysis

MFCC used to produce voice

feaature. DTW to select the pattern

that match the database(matLab).

Acoustic Model

provide the acoustic sound of a language

and can be recognized the chore of a

particular user speech pattern and

acoustic environment.

To make pattern recognition PCM

transfer into frequency domain

Speaker Dependent

Speaker Independent

Discrete Speaker Recognition

Continuous Speech Recognition

Natural Languages

Pitch

Timber

Harmonics

Loudness

Rhythm

Attack

Sustain

Decay

Speed

COMPRESSION

in which particles are crowded

together, appear as upward curves in

the line.

RAREFACTION

in which particles are spread apart,

appear as downward curves in the line.

WAVELENGTH

this is the distance from the crest of one

wave to the crest of the next.

FREQUENCY

this is the number of waves that

pass a point in each second.

AMPLITUDE

this is the measure of the amount

of energy in a sound wave.

High Frequency Sound Wave Low Frequency Sound Wave

This is how high or low a sound seems.

A bird makes a high pitch.

A lion makes a low pitch.

Voices are different caused

by

INTENSITY(depend on amplitude) ,

PITCH(frequency) ,

TONE(pleasant or unpleasent).

Divide the sound wave into evenly spaced blocks

Process each block for important characteristics, such as strength across various frequency ranges, number of zero crossings, and total energy.

Using this characteristic vector, attempt to associate each block with a phone, which is the most basic unit of speech, producing a string of phones.

Find the word whose model is the most likely match to the string of phones which was produced.

Transfer the PCM into Accoustic

Apply GRAMMER

Figure out which PHONEMS are spoken

Convert PHONEMS into WORDS

Acoustic Phonetic Approach

Pattern Recognition Approach(HMM)

Artificial Intelligence Approach(Neural Networks)

Speech Processing

Analysis/Syntactic Coding

Recognition

Speaker Recognition Language Identification

Speech Recognition

Speech Mode Speaking StyleVocabulary SizeSpeaker Mode

•Isolated Speech

•Continuous Speech

•Speaker Dependent

•Speaker In-Dependent

•Speaker Adaptive

•Small

•Medium

•large

•Dictation

•Spontaneous

•Vocal Chord play active role in the

production of SOUND.

e.g. a/e/I

•It has high frequency

Voiced Sound

•When Vocal Chord is Inactive

Called UN VOICED SOUND

e.g. s/f

•It build up by pressure

Un Voiced Sound

Speech Coding

Speech Recognition

Speech Verification/Identification

Speech Enhancement(remove background noises)

Speech Synthesis

Grammar Design

Signal Processing

Phonemic Recognition

Word Recognition

Result Recognition

Education

Automatic Speech Recognion