Upload
hubert-lyons
View
243
Download
0
Embed Size (px)
DESCRIPTION
ASR - basics Page 3 Automated Speach Recognotion Observations representing a speech signal Vocabulary V of different words Our goal – find the most likely word sequence Since we have language modeling acoustic modeling
Citation preview
Automated Speach Recognotion
Automated Speach Recognition
By: Amichai Painsky
Automated Speech Recognition - setup
Page 2
Automated Speach Recognotion
• Input – speech waveform:
• Preprocessing:
• Modeling:
• Output – transcription: The boy is in the red house
ASR - basics
Page 3
Automated Speach Recognotion
• Observations representing a speech signal
• Vocabulary V of different words
• Our goal – find the most likely word sequence
• Since
we have
language modeling
acoustic modeling
Observations preprocessing
Page 4
Automated Speach Recognotion
• A sampled waveform is converted into a sequence of parameter vectors at a certain frame rate
• A frame rate of 10 ms is usually taken, because a speech signal is assumed to be stationary for about 10 ms
• Many different ways to extract meaningful features have been developed, some based on acoustic concepts or knowledge of the human vocal tract and psychophysical knowledge of the human perception
Language modeling
Page 5
Automated Speach Recognotion
• Most generally, the probability of a sequence m of words is
• Language is highly structured and limited histories are capable of capturing quite a bit of this structure. Bigram models:
• More powerful two-words (trigrams) history models
• Longer history -> exponentially increasing number of models -> more data is required to train, more parameters, more overfitting
• Partial Matching modeling
Acoustic modeling
Page 6
Automated Speach Recognotion
• Determines what sound is pronounced when a given sentence is uttered
• Number of possibilities is infinite! (depends on the speaker, the ambiance, microphone placement etc.)
• Possible solution – a parametric model in the form of Hidden Markov Model
• Notice other solutions may also apply (for example, neural nets)
Hidden Markov Model
Page 7
Automated Speach Recognotion
• A simple example of HMM:
Hidden Markov Model
Page 8
Automated Speach Recognotion
Hidden Markov Model – Forward Algorithm
Page 9
Automated Speach Recognotion
• Given an observation sequence (for example 10110), what is the probability it was generated from a given HMM (for example – HMM from previous slides).
• For a path q=12312 and a given HMM :
• Therefore, summing over all possible paths:
Hidden Markov Model – Forward Algorithm
Page 10
Automated Speach Recognotion
• Complexity: for a sequence of T observations each path necessitates 2T multiplications. Total number of paths is , therefore
• A more efficient approach – forward algorithm
Hidden Markov Model – Forward Algorithm
Page 11
Automated Speach Recognotion
• Forward algorithm: calculates the probabilities for all subsequences in each time step, using the results from the previous step (dynamic programming)
• Define – the probability of being at the state i at time t and having observed the partial sequence :
Hidden Markov Model – Forward Algorithm
Page 12
Automated Speach Recognotion
• Define – the probability of being at the state i at time t and having observed the partial sequence :
Hidden Markov Model – Forward Algorithm
Page 13
Automated Speach Recognotion
• Complexity - At time t each calculation only involves N previous values of . The length of the sequence is T. Therefore, for each state we need and for the total N states we need
Hidden Markov Model – Viterbi algorithm
Page 14
Automated Speach Recognotion
• Previously: given an observation sequence (for example 10110), what is the probability it was generated from a given HMM
• We now ask: given an observation sequence , what is the sequence of states that it most likely to have generated it?
Hidden Markov Model – Viterbi algorithm
Page 15
Automated Speach Recognotion
• Define as the best path from the start state to state i at time t
• is our objective
• We solve this with the same forward algorithm as before, but this time with maximization instead of summation.
Hidden Markov Model – Viterbi algorithm
Page 16
Automated Speach Recognotion
Hidden Markov Model – Viterbi algorithm, example
Page 17
Automated Speach Recognotion
• Observations sequence - 101
Hidden Markov Model – model fitting
Page 18
Automated Speach Recognotion
• In practice, the parameters of the HMM are unknown
• We are interested in
• No analytical maximum likelihood solution. We turn to Baum-Welch algorithm or forward-backward algorithm.
• Basic idea – count the visits of each state and the number of transitions to derive a probability estimator
Hidden Markov Model – model fitting
Page 19
Automated Speach Recognotion
• Define:
This is the conditional probability that are observed, given that the system starts at state given the model . This can be calculated inductively:
• Define – the probability of being at the state i at time t and having observed the partial sequence :
Hidden Markov Model – model fitting
Page 20
Automated Speach Recognotion
• Define:
This is the conditional probability that are observed, given that the system starts at state given the model .
• Define – the probability of being at the state i at time t and having observed the partial sequence :
• Therefore, the probability of being in state i at time t, given the entire observations sequence and the model is simply:
Hidden Markov Model – model fitting
Page 21
Automated Speach Recognotion
• Define: the probability of being in state i at time t and state j at time t+1 given the model and the observations sequence:
• Graphically:
Hidden Markov Model – model fitting
Page 22
Automated Speach Recognotion
• We are now ready to introduce the parameters estimators:
Transition probability estimator:
The expected number of transitions from state i to j, normalized by the expected number of visits of state i
Hidden Markov Model – model fitting
Page 23
Automated Speach Recognotion
• We are now ready to introduce the parameters estimators:
Observations probability estimator:
The expected number times in state j at which the symbol was observed, normalized by the expected number of times the system visited state j
Hidden Markov Model – model fitting
Page 24
Automated Speach Recognotion
• Notice that the parameters we wish to estimate actually appear in both sides of the equation:
Hidden Markov Model – model fitting
Page 25
Automated Speach Recognotion
• Notice that the parameters we wish to estimate actually appear in both sides of the equation
• Therefore, we use an iterative procedure: after stating with an initial guess for the parameters we gradually update at each iteration and terminate once the parameters stop changing to a certain limit.
Hidden Markov Model – model fitting
Page 26
Automated Speach Recognotion
• For continuous observations
• We estimate the mean and variance for each state j:
Conclusions and final remarks
Page 27
Automated Speach Recognotion
• We learned how to:I. Estimate HMM parameters from a sequence of observationsII. Determine the probability of observing a sequence given an HMMIII. Determine the most likely sequence of states, given an HMM and a
sequence of observations
• Notice the states may either represent words, syllables, phoneme, etc. This is up for the system architect to decide
• For example, words are more informative than syllables, but results with more states and less accurate probability estimation (curse of dimensionality)
Questions?
Page 28
Thank you!
Automated Speach Recognotion