29

Automatic Speech Recognion

Embed Size (px)

Citation preview

Page 1: Automatic Speech Recognion
Page 2: Automatic Speech Recognion

Types of ASR?????

Approaches to ASR

ASR(Automatic Speech Recognition)?

What is Voice Recognition???

What Is Voice??

Process of Voice Recognition????

Why Voices are Different???

Component of Sound???

How Speech Recognition Works????

Page 3: Automatic Speech Recognion

Application of Speech Processing??

Process of Speech Production???

Classification to Speech Sounds??

Approaches to Speech Recognition??

Page 4: Automatic Speech Recognion

The voice consists of sound made by a

human being using the vocal folds for

talking, singing, laughing, crying,

screaming, etc.

The voice consists of sound made by a

human being using the vocal folds for

talking, singing, laughing, crying,

screaming, etc.

Page 5: Automatic Speech Recognion

It is the process of converting voice into

electric signals.

Signals transform into CODING

PATTERN.

Page 6: Automatic Speech Recognion

The first ASR device was used in 1952and recognized single digits spoken by a

user

Page 7: Automatic Speech Recognion

TEMPLATE MATCHING

Template matching is

the simplest technique

and has the highest

accuracy when used

properly, but it also

suffers from the most

limitations.

ASR

Feature Analysis

A more general form

of voice recognition is

available through

feature analysis and

this technique usually

leads to "speaker-

independent" voice

recognition.

Page 8: Automatic Speech Recognion

•It is SPEAKE DEPENDENT.

•It match voice with already saved

templates.

•Before it we’ve to trained the system.

• System must be trained.

•User speak same word which are avail

in template.

•Recognition accuracy can be about 98

percent.

Template Matching

•It is SPEAKER INDEPENDENT.

•First process the giving voice as inputut

•Using LPC(Linear Productive Coding)

•Attempt to find similarities b/w

expected

•Input and Digitized input.

•Recognition accuracy for

speaker-independent systems is

somewhat less than for

speaker-dependent systems, usually

between 90 and 95 percent.

Feature Analysis

Page 9: Automatic Speech Recognion

TEXT PhonemsArticulary

Motions

Speak/

Say Someting

Acoustic Wave Form

Acoustic Wave FormSpectrum

Analysis

Feature

Retractions

Coding

Phonems/

Word/Sentence

Semantics

Discrete Input Continuous Input

Page 10: Automatic Speech Recognion

Vocal Tract

Consist of laryngeal pharynx, oral

phyrnax, oral cavity, nassal cavity,

nassal phyrnx.

Specturm Analysis

MFCC used to produce voice

feaature. DTW to select the pattern

that match the database(matLab).

Page 11: Automatic Speech Recognion

Acoustic Model

provide the acoustic sound of a language

and can be recognized the chore of a

particular user speech pattern and

acoustic environment.

Page 12: Automatic Speech Recognion

To make pattern recognition PCM

transfer into frequency domain

Page 13: Automatic Speech Recognion

Speaker Dependent

Speaker Independent

Discrete Speaker Recognition

Continuous Speech Recognition

Natural Languages

Page 14: Automatic Speech Recognion

Pitch

Timber

Harmonics

Loudness

Rhythm

Attack

Sustain

Decay

Speed

Page 15: Automatic Speech Recognion
Page 16: Automatic Speech Recognion

COMPRESSION

in which particles are crowded

together, appear as upward curves in

the line.

RAREFACTION

in which particles are spread apart,

appear as downward curves in the line.

WAVELENGTH

this is the distance from the crest of one

wave to the crest of the next.

Page 17: Automatic Speech Recognion

FREQUENCY

this is the number of waves that

pass a point in each second.

AMPLITUDE

this is the measure of the amount

of energy in a sound wave.

Page 18: Automatic Speech Recognion

High Frequency Sound Wave Low Frequency Sound Wave

This is how high or low a sound seems.

A bird makes a high pitch.

A lion makes a low pitch.

Page 19: Automatic Speech Recognion

Voices are different caused

by

INTENSITY(depend on amplitude) ,

PITCH(frequency) ,

TONE(pleasant or unpleasent).

Page 20: Automatic Speech Recognion

Divide the sound wave into evenly spaced blocks

Process each block for important characteristics, such as strength across various frequency ranges, number of zero crossings, and total energy.

Using this characteristic vector, attempt to associate each block with a phone, which is the most basic unit of speech, producing a string of phones.

Find the word whose model is the most likely match to the string of phones which was produced.

Page 21: Automatic Speech Recognion

Transfer the PCM into Accoustic

Apply GRAMMER

Figure out which PHONEMS are spoken

Convert PHONEMS into WORDS

Page 22: Automatic Speech Recognion

Acoustic Phonetic Approach

Pattern Recognition Approach(HMM)

Artificial Intelligence Approach(Neural Networks)

Page 23: Automatic Speech Recognion
Page 24: Automatic Speech Recognion
Page 25: Automatic Speech Recognion
Page 26: Automatic Speech Recognion

Speech Processing

Analysis/Syntactic Coding

Recognition

Speaker Recognition Language Identification

Speech Recognition

Speech Mode Speaking StyleVocabulary SizeSpeaker Mode

•Isolated Speech

•Continuous Speech

•Speaker Dependent

•Speaker In-Dependent

•Speaker Adaptive

•Small

•Medium

•large

•Dictation

•Spontaneous

Page 27: Automatic Speech Recognion

•Vocal Chord play active role in the

production of SOUND.

e.g. a/e/I

•It has high frequency

Voiced Sound

•When Vocal Chord is Inactive

Called UN VOICED SOUND

e.g. s/f

•It build up by pressure

Un Voiced Sound

Page 28: Automatic Speech Recognion

Speech Coding

Speech Recognition

Speech Verification/Identification

Speech Enhancement(remove background noises)

Speech Synthesis

Page 29: Automatic Speech Recognion

Grammar Design

Signal Processing

Phonemic Recognition

Word Recognition

Result Recognition