Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky

Automated Speach Recognotion

Automated Speach Recognition

By: Amichai Painsky

Automated Speech Recognition - setup

Page 2


• Input – speech waveform:

• Preprocessing:

• Modeling:

• Output – transcription: The boy is in the red house

ASR - basics

Page 3


• Observations representing a speech signal

• Vocabulary V of different words

• Our goal – find the most likely word sequence

• Since

we have

language modeling

acoustic modeling

Observations preprocessing

Page 4


• A sampled waveform is converted into a sequence of parameter vectors at a certain frame rate

• A frame rate of 10 ms is usually taken, because a speech signal is assumed to be stationary for about 10 ms

• Many different ways to extract meaningful features have been developed, some based on acoustic concepts or knowledge of the human vocal tract and psychophysical knowledge of the human perception

Language modeling

Page 5


• Most generally, the probability of a sequence m of words is

• Language is highly structured and limited histories are capable of capturing quite a bit of this structure. Bigram models:

• More powerful two-words (trigrams) history models

• Longer history -> exponentially increasing number of models -> more data is required to train, more parameters, more overfitting

• Partial Matching modeling

Acoustic modeling

Page 6


• Determines what sound is pronounced when a given sentence is uttered

• Number of possibilities is infinite! (depends on the speaker, the ambiance, microphone placement etc.)

• Possible solution – a parametric model in the form of Hidden Markov Model

• Notice other solutions may also apply (for example, neural nets)

Hidden Markov Model

Page 7


• A simple example of HMM:

Hidden Markov Model

Page 8


Hidden Markov Model – Forward Algorithm

Page 9


• Given an observation sequence (for example 10110), what is the probability it was generated from a given HMM (for example – HMM from previous slides).

• For a path q=12312 and a given HMM :

• Therefore, summing over all possible paths:


Page 10


• Complexity: for a sequence of T observations each path necessitates 2T multiplications. Total number of paths is , therefore

• A more efficient approach – forward algorithm


Page 11


• Forward algorithm: calculates the probabilities for all subsequences in each time step, using the results from the previous step (dynamic programming)

• Define – the probability of being at the state i at time t and having observed the partial sequence :


Page 12




Page 13


• Complexity - At time t each calculation only involves N previous values of . The length of the sequence is T. Therefore, for each state we need and for the total N states we need

Hidden Markov Model – Viterbi algorithm

Page 14


• Previously: given an observation sequence (for example 10110), what is the probability it was generated from a given HMM

• We now ask: given an observation sequence , what is the sequence of states that it most likely to have generated it?


Page 15


• Define as the best path from the start state to state i at time t

• is our objective

• We solve this with the same forward algorithm as before, but this time with maximization instead of summation.


Page 16


Hidden Markov Model – Viterbi algorithm, example

Page 17


• Observations sequence - 101

Hidden Markov Model – model fitting

Page 18


• In practice, the parameters of the HMM are unknown

• We are interested in

• No analytical maximum likelihood solution. We turn to Baum-Welch algorithm or forward-backward algorithm.

• Basic idea – count the visits of each state and the number of transitions to derive a probability estimator


Page 19


• Define:

This is the conditional probability that are observed, given that the system starts at state given the model . This can be calculated inductively:



Page 20


• Define:

This is the conditional probability that are observed, given that the system starts at state given the model .


• Therefore, the probability of being in state i at time t, given the entire observations sequence and the model is simply:


Page 21


• Define: the probability of being in state i at time t and state j at time t+1 given the model and the observations sequence:

• Graphically:


Page 22


• We are now ready to introduce the parameters estimators:

Transition probability estimator:

The expected number of transitions from state i to j, normalized by the expected number of visits of state i


Page 23


• We are now ready to introduce the parameters estimators:

Observations probability estimator:

The expected number times in state j at which the symbol was observed, normalized by the expected number of times the system visited state j


Page 24


• Notice that the parameters we wish to estimate actually appear in both sides of the equation:


Page 25


• Notice that the parameters we wish to estimate actually appear in both sides of the equation

• Therefore, we use an iterative procedure: after stating with an initial guess for the parameters we gradually update at each iteration and terminate once the parameters stop changing to a certain limit.


Page 26


• For continuous observations

• We estimate the mean and variance for each state j:

Conclusions and final remarks

Page 27


• We learned how to:I. Estimate HMM parameters from a sequence of observationsII. Determine the probability of observing a sequence given an HMMIII. Determine the most likely sequence of states, given an HMM and a

sequence of observations

• Notice the states may either represent words, syllables, phoneme, etc. This is up for the system architect to decide

• For example, words are more informative than syllables, but results with more states and less accurate probability estimation (curse of dimensionality)

Questions?

Page 28

Thank you!


Documents

Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky