1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types

11

7-Speech Recognition7-Speech Recognition

Speech Recognition Concepts Speech Recognition Concepts

Speech Recognition ApproachesSpeech Recognition Approaches

Recognition TheoriesRecognition Theories

Bayse RuleBayse Rule

Simple Language ModelSimple Language Model

P(A|W) Network TypesP(A|W) Network Types

22

7-Speech Recognition (Cont’d)7-Speech Recognition (Cont’d)

HMM Calculating ApproachesHMM Calculating Approaches

Neural ComponentsNeural Components

Three Basic HMM ProblemsThree Basic HMM Problems

Viterbi AlgorithmViterbi Algorithm

State Duration ModelingState Duration Modeling

Training In HMMTraining In HMM

33

Recognition TasksRecognition Tasks

Isolated Word Recognition (IWR)Isolated Word Recognition (IWR)

Connected Word (CW) , And Continuous Connected Word (CW) , And Continuous Speech Recognition (CSR)Speech Recognition (CSR)

Speaker Dependent, Multiple Speaker, And Speaker Dependent, Multiple Speaker, And Speaker Independent Speaker Independent

Vocabulary SizeVocabulary Size– Small <20Small <20– Medium >100 , <1000Medium >100 , <1000– Large >1000, <10000Large >1000, <10000– Very Large >10000Very Large >10000

44

Speech Recognition ConceptsSpeech Recognition Concepts

NLPSpeech

Processing

Text Speech

NLPSpeech

ProcessingSpeech

Understanding

Speech Synthesis

TextPhone Sequence

Speech Recognition

Speech recognition is inverse of Speech Synthesis

55

Speech Recognition Speech Recognition ApproachesApproaches

Bottom-Up ApproachBottom-Up Approach

Top-Down ApproachTop-Down Approach

Blackboard ApproachBlackboard Approach

66

Bottom-Up ApproachBottom-Up Approach

Signal Processing

Feature Extraction

Segmentation

Signal Processing

Feature Extraction

Segmentation

Segmentation

Sound Classification Rules

Phonotactic Rules

Lexical Access

Language Model

Voiced/Unvoiced/Silence

Kno

wle

dge

Sou

rces

Recognized Utterance

77

UnitMatching

System

Top-Down ApproachTop-Down Approach

FeatureAnalysis

LexicalHypothesis

SyntacticHypothesis

SemanticHypothesis

UtteranceVerifier/Matcher

Inventory of speech

recognition units

Word Dictionary Grammar

TaskModel

Recognized Utterance

88

Blackboard ApproachBlackboard Approach

EnvironmentalProcesses

Acoustic Processes Lexical

Processes

SyntacticProcesses

SemanticProcesses

Blackboard

99

Recognition TheoriesRecognition Theories

Articulatory Based RecognitionArticulatory Based Recognition– Use from Articulatory system for recognitionUse from Articulatory system for recognition– This theory is the most successful until nowThis theory is the most successful until now

Auditory Based RecognitionAuditory Based Recognition– Use from Auditory system for recognitionUse from Auditory system for recognition

Hybrid Based RecognitionHybrid Based Recognition– Is a hybrid from the above theoriesIs a hybrid from the above theories

Motor TheoryMotor Theory– Model the intended gesture of speakerModel the intended gesture of speaker

1010

Recognition ProblemRecognition Problem

We have the sequence of acoustic We have the sequence of acoustic symbols and we want to find the words symbols and we want to find the words that expressed by speakerthat expressed by speaker

Solution : Finding the most probable of Solution : Finding the most probable of word sequence by having Acoustic word sequence by having Acoustic symbolssymbols

1111

Recognition ProblemRecognition Problem

A : Acoustic SymbolsA : Acoustic Symbols

W : Word SequenceW : Word Sequence

we should find so that we should find so that W)|(max)|ˆ( AWPAWP

W

1212

Bayse RuleBayse Rule

),()()|( yxPyPyxP

)(

)()|()|(

yP

xPxyPyxP

)(

)()|()|(

AP

WPWAPAWP

1313

Bayse Rule (Cont’d)Bayse Rule (Cont’d)

)(

)()|(max

AP

WPWAPW

)|(max)|ˆ( AWPAWPW

)()|(max

)|(maxˆ

WPWAPArg

AWPArgW

W

W

1414

Simple Language ModelSimple Language Model

nwwwww 321

),...,,,(

),...,,|(

).....,,|(

),|()|()(

)|()(

121

121

1234

123121

1211

WWWWP

WWWWP

WWWWP

WWWPWWPWP

wwwwPwP

nnn

nnn

iii

n

i

Computing this probability is very difficult and we need a very big database. So we use from Trigram and Bigram models.

1515

Simple Language Model Simple Language Model (Cont’d)(Cont’d)

)|()( 211

iii

n

iwwwPwP

)|()( 11

ii

n

iwwPwP

Trigram :

Bigram :

)()(1

i

n

iwPwP

Monogram :

1616

Simple Language Model Simple Language Model (Cont’d)(Cont’d)

)|( 123 wwwP

Computing Method :Number of happening W3 after W1W2

Total number of happening W1W2

AdHoc Method :

)()|()|()|( 332321231123 wfwwfwwwfwwwP

1717

Error Production FactorError Production Factor

Prosody (Recognition should be Prosody (Recognition should be Prosody Independent)Prosody Independent)

Noise (Noise should be prevented)Noise (Noise should be prevented)

Spontaneous SpeechSpontaneous Speech

1818

P(A|W) Computing P(A|W) Computing ApproachesApproaches

Dynamic Time Warping (DTW)Dynamic Time Warping (DTW)

Hidden Markov Model (HMM)Hidden Markov Model (HMM)

Artificial Neural Network (ANN)Artificial Neural Network (ANN)

Hybrid SystemsHybrid Systems

1919

Dynamic Time Warping Dynamic Time Warping Method (DTW)Method (DTW)

To obtain a global distance between two speech patterns a time alignment must be performed

Ex :A time alignment path between a template pattern “SPEECH” and a noisy input “SsPEEhH”

2020

Artificial Neural NetworkArtificial Neural Network

...

1x

0x

1w0w

1Nw

1Nx

y)(

1

0

i

N

iixwy

Simple Computation Element of a Neural Network

2121

Artificial Neural Network Artificial Neural Network (Cont’d)(Cont’d)

Neural Network TypesNeural Network Types– PerceptronPerceptron– Time DelayTime Delay– Time Delay Neural Network Computational Time Delay Neural Network Computational

Element (TDNN)Element (TDNN)

2222


. . .

. . .

0x

0y 1My

1Nx

Single Layer Perceptron

2323


. . .

. . .

Three Layer Perceptron

. . .

. . .

2424

Hybrid MethodsHybrid Methods

Hybrid Neural Network and Matched Filter For Hybrid Neural Network and Matched Filter For RecognitionRecognition

PATTERN

CLASSIFIER

SpeechAcoustic Features Delays

Output Units

2525

Neural Network PropertiesNeural Network Properties

The system is simple, But too much The system is simple, But too much iteration is needed for trainingiteration is needed for training

Doesn’t determine a specific structureDoesn’t determine a specific structure

Regardless of simplicity, the results are Regardless of simplicity, the results are goodgood

Training size is large, so training should be Training size is large, so training should be offlineoffline

Accuracy is relatively goodAccuracy is relatively good

2626

Hidden Markov ModelHidden Markov Model

Observation : O1,O2, . . . Observation : O1,O2, . . .

States in time : q1, q2, . . .States in time : q1, q2, . . .

All states : s1, s2, . . ., sNAll states : s1, s2, . . ., sN

tOOOO ,,,, 321

tqqqq ,,,, 321

Si Sjjiaija

2727

Hidden Markov Model (Cont’d)Hidden Markov Model (Cont’d)

Discrete Markov ModelDiscrete Markov Model

)|(

),,,|(

1

121

itjt

zktitjt

sqsqP

sqsqsqsqP

Degree 1 Markov Model

2828

Hidden Markov Model (Cont’d)Hidden Markov Model (Cont’d)

)|( 1 itjtij sqsqPa

ija : Transition Probability from Si to Sj ,

Nji ,1

2929

Discrete Markov Model Discrete Markov Model ExampleExample

S1 : The weather is rainyS2 : The weather is cloudyS3 : The weather is sunny

8.01.01.0

2.06.02.0

3.03.04.0

}{ ijaA

rainy cloudy sunnyrainy

cloudy

sunny

3030

Hidden Markov Model Example Hidden Markov Model Example (Cont’d)(Cont’d)

Question 1:How much is this probability:Sunny-Sunny-Sunny-Rainy-Rainy-Sunny-Cloudy-Cloudy

22311333 ssssssss

22321311313333 aaaaaaa

87654321 qqqqqqqq410536.1

3131

Hidden Markov Model Example Hidden Markov Model Example (Cont’d)(Cont’d)

Question 2:The probability of staying in state Si for d days if we are in state Si?

NisqP ii 1),( 1The probability of being in state i in time t=1

)()1()( 1 dPaassssP iiidiiijiii

d Days

3232

Discrete Density HMM Discrete Density HMM ComponentsComponents

N : Number Of StatesN : Number Of States

M : Number Of OutputsM : Number Of Outputs

A (NxN) : State Transition Probability A (NxN) : State Transition Probability MatrixMatrix

B (NxM): Output Occurrence Probability in B (NxM): Output Occurrence Probability in each stateeach state

(1xN): Initial State Probability(1xN): Initial State Probability),,( BA : Set of HMM Parameters

3333

Three Basic HMM ProblemsThree Basic HMM Problems

Given an HMM Given an HMM and a sequence of and a sequence of observations observations O,O,what is the probability what is the probability ? ?

Given a model and a sequence of Given a model and a sequence of observations observations OO, what is the most likely , what is the most likely state sequence in the model that produced state sequence in the model that produced the observations?the observations?

Given a model Given a model and a sequence of and a sequence of observationsobservations O, O, how should we adjust how should we adjust model parameters in order to maximize model parameters in order to maximize ? ?

)|( OP

)|( OP

3434

First Problem SolutionFirst Problem Solution

)(),|(),|(11

tq

T

ttt

T

tobqoPqoP

t

TT qqqqqqq aaaqP132211

)|(

)()|(),( yPyxPyxP

)|(),|()|,( zyPzyxPzyxP We Know That:

And

3535

First Problem Solution (Cont’d)First Problem Solution (Cont’d)

)|(),|()|,( qPqoPqoP

)()()(

)|,(

122111 21 Tqqqqqqqq obaobaob

qoP

TTT

T

TTTqqq

Tqqqqqqqq

q

obaobaob

qoPoP

21

122111)()()(

)|,()|(

21

Computation Order : )2( TTNO

3636

Forward Backward ApproachForward Backward Approach

)|,,,,()( 21 iqoooPi ttt

Niobi ii 1),()( 11

Computing )(it

1) Initialization

3737

Forward Backward Approach Forward Backward Approach (Cont’d)(Cont’d)

NjTt

obaij tjij

N

itt

1,11

)(])([)( 11

1 2) Induction :

3) Termination :

N

iT ioP

1

)()|(

Computation Order : )( 2TNO

3838

Backward VariableBackward Variable

),|,,,()( 21 iqoooPi tTttt

NiiT 1,1)(1) Initialization

2)Induction

NiAndTTt

jobaiN

jttjijt

11,,2,1

)()()(1

11

3939

Second Problem SolutionSecond Problem Solution

Finding the most likely state sequenceFinding the most likely state sequence

N

itt

ttN

it

t

ttt

ii

ii

iqoP

iqoP

oP

iqoPoiqPi

11

)()(

)()(

)|,(

)|,(

)|(

)|,(),|()(

Individually most likely state :Ttiq t

it 1)],([maxarg*

4040

Viterbi AlgorithmViterbi Algorithm

Define : Define :

Ni

oooiqqqqP

i

tttqqq

t

t

1

]|,,,,,,,,[max

)(

21121,,, 121

P is the most likely state sequence with this conditions : state i , time t and observation o

4141

Viterbi Algorithm (Cont’d)Viterbi Algorithm (Cont’d)

)(].)(max[)( 11 tjijti

t obaij

1) Initialization

0)(

1),()(

1

11

i

Niobi ii

)(it Is the most likely state before state i at time t-1

4242


NjTt

aij

obaij

ijtNi

t

tjijtNi

t

1,2

])([maxarg)(

)(])([max)(

11

11

2) Recursion

4343


)]([maxarg

)]([max

1

*

1

*

iq

ip

TNi

T

TNi

3) Termination:

4)Backtracking:

1,,2,1),( *11

* TTtqq ttt

4444

Third Problem SolutionThird Problem Solution

Parameters Estimation using Baum-Parameters Estimation using Baum-Welch Or Expectation Maximization Welch Or Expectation Maximization (EM) Approach(EM) Approach

Define:

N

i

N

jttjijt

ttjijt

tt

ttt

jobai

jobai

oP

jqiqoP

ojqiqPji

1 111

11

1

1

)()()(

)()()(

)|(

)|,,(

),|,(),(

4545

Third Problem Solution Third Problem Solution (Cont’d)(Cont’d)

N

jtt jii

1

),()(

1

1

)(T

tt i

T

tt ji

1

),(

: Expectation value of the number of jumps from state i

: Expectation value of the number of jumps from state i to state j

4646

Third Problem Solution Third Problem Solution (Cont’d)(Cont’d)

)(1 ii

T

tt

T

tt

ij

i

jia

1

1

)(

),(

T

tt

Vo

T

tt

j

j

j

kb kt

1

1

)(

)(

)(

4747

Baum Auxiliary FunctionBaum Auxiliary Function

q

qoPqoPQ )|,(log)'|,()|( '

)'|()|(

)|()|(: '

oPoP

QQif

By this approach we will reach to a local optimum

4848

Restrictions Of Restrictions Of Reestimation FormulasReestimation Formulas

11

N

ii

NiaN

jij

1,11

NjkbM

kj

1,1)(1

4949

Continuous Observation Continuous Observation DensityDensity

We have amounts of a PDF instead of We have amounts of a PDF instead of

We haveWe have

)|()( jqVoPkb tktj

1)(,),,()(1

dooboCob j

M

kjkjkjkj

Mixture Coefficients

Average Variance

5050

Continuous Observation Continuous Observation DensityDensity

Mixture in HMMMixture in HMM

),,()( jkjkjkk

j oCMaxob

M2|1M1|1

M4|1M3|1

M2|3M1|3

M4|3M3|3

M2|2M1|2

M4|2M3|2

S1 S2 S3Dominant Mixture:

5151

Continuous Observation Continuous Observation Density (Cont’d)Density (Cont’d)

Model Parameters:Model Parameters:

),,,,( CA

N×N N×M×K×KN×M×KN×M1×N

N : Number Of StatesM : Number Of Mixtures In Each StateK : Dimension Of Observation Vector

5252


T

t

M

kt

T

tt

jk

kj

kjC

1 1

1

),(

),(

T

tt

t

T

tt

jk

kj

okj

1

1

),(

),(

5353


T

tt

jktjkt

T

tt

jk

kj

ookj

1

1

),(

)()(),(

),( kjt Probability of event j’th state and k’th mixture at time t

5454

State Duration ModelingState Duration Modeling

Si Sj

Probability of staying d times in state i :

)1()( 1ii

diii aadP

jia

ija

5555

State Duration Modeling State Duration Modeling (Cont’d)(Cont’d)

Si Sjjia

……. …….

HMM With clear duration

ija )(dPj)(dPi

5656

State Duration Modeling State Duration Modeling (Cont’d)(Cont’d)

HMM consideration with State Duration :HMM consideration with State Duration :– Selecting using ‘sSelecting using ‘s– Selecting usingSelecting using– Selecting Observation Sequence Selecting Observation Sequence

using using in practice we assume the following in practice we assume the following

independence:independence:

– Selecting next state using transition probabilities Selecting next state using transition probabilities . We also have an additional constraint: . We also have an additional constraint:

),(),,,(1

1

11 121 tq

d

tdq OtbOOOb

iiq 1

dOOO ,,, 21 )(

1dPq1d

21qqa

),,,(11 21 dq OOOb

jq 2

011qqa

5757


Maximum Likelihood (ML)Maximum Likelihood (ML)

Maximum Mutual Information (MMI)Maximum Mutual Information (MMI)

Minimum Discrimination Information (MDI)Minimum Discrimination Information (MDI)

5858


Maximum Likelihood (ML)Maximum Likelihood (ML)

)|( 1oP

)|( 2oP)|( 3oP

)|( noP

.

.

.

)]|([*V

rOPMaximumP

ObservationSequence

5959

Training In HMM (Cont’d)Training In HMM (Cont’d)

Maximum Mutual Information (MMI)Maximum Mutual Information (MMI)

)()(

)|,(log),(

POP

OPOI

v

ww

v

wPwOP

OPOI

1

)(),|(log

)|(log),(

Mutual Information

}{, v

6060

Training In HMM (Cont’d)Training In HMM (Cont’d)

Minimum Discrimination Information Minimum Discrimination Information (MDI)(MDI)

dooP

oqoqPQI )|(

)(log)():(

),,,( 21 TOOOO

),,,( 21 tRRRR

Observation :

Auto correlation :

):(inf),( PQIPR )(RQ

Documents

1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types