28
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky

Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky

Embed Size (px)

DESCRIPTION

ASR - basics Page 3 Automated Speach Recognotion Observations representing a speech signal Vocabulary V of different words Our goal – find the most likely word sequence Since we have language modeling acoustic modeling

Citation preview

Page 1: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky

Automated Speach Recognotion

Automated Speach Recognition

By: Amichai Painsky

Page 2: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky

Automated Speech Recognition - setup

Page 2

Automated Speach Recognotion

• Input – speech waveform:

• Preprocessing:

• Modeling:

• Output – transcription: The boy is in the red house

Page 3: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky

ASR - basics

Page 3

Automated Speach Recognotion

• Observations representing a speech signal

• Vocabulary V of different words

• Our goal – find the most likely word sequence

• Since

we have

language modeling

acoustic modeling

Page 4: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky

Observations preprocessing

Page 4

Automated Speach Recognotion

• A sampled waveform is converted into a sequence of parameter vectors at a certain frame rate

• A frame rate of 10 ms is usually taken, because a speech signal is assumed to be stationary for about 10 ms

• Many different ways to extract meaningful features have been developed, some based on acoustic concepts or knowledge of the human vocal tract and psychophysical knowledge of the human perception

Page 5: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky

Language modeling

Page 5

Automated Speach Recognotion

• Most generally, the probability of a sequence m of words is

• Language is highly structured and limited histories are capable of capturing quite a bit of this structure. Bigram models:

• More powerful two-words (trigrams) history models

• Longer history -> exponentially increasing number of models -> more data is required to train, more parameters, more overfitting

• Partial Matching modeling

Page 6: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky

Acoustic modeling

Page 6

Automated Speach Recognotion

• Determines what sound is pronounced when a given sentence is uttered

• Number of possibilities is infinite! (depends on the speaker, the ambiance, microphone placement etc.)

• Possible solution – a parametric model in the form of Hidden Markov Model

• Notice other solutions may also apply (for example, neural nets)

Page 7: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky

Hidden Markov Model

Page 7

Automated Speach Recognotion

• A simple example of HMM:

Page 8: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky

Hidden Markov Model

Page 8

Automated Speach Recognotion

Page 9: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky

Hidden Markov Model – Forward Algorithm

Page 9

Automated Speach Recognotion

• Given an observation sequence (for example 10110), what is the probability it was generated from a given HMM (for example – HMM from previous slides).

• For a path q=12312 and a given HMM :

• Therefore, summing over all possible paths:

Page 10: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky

Hidden Markov Model – Forward Algorithm

Page 10

Automated Speach Recognotion

• Complexity: for a sequence of T observations each path necessitates 2T multiplications. Total number of paths is , therefore

• A more efficient approach – forward algorithm

Page 11: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky

Hidden Markov Model – Forward Algorithm

Page 11

Automated Speach Recognotion

• Forward algorithm: calculates the probabilities for all subsequences in each time step, using the results from the previous step (dynamic programming)

• Define – the probability of being at the state i at time t and having observed the partial sequence :

Page 12: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky

Hidden Markov Model – Forward Algorithm

Page 12

Automated Speach Recognotion

• Define – the probability of being at the state i at time t and having observed the partial sequence :

Page 13: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky

Hidden Markov Model – Forward Algorithm

Page 13

Automated Speach Recognotion

• Complexity - At time t each calculation only involves N previous values of . The length of the sequence is T. Therefore, for each state we need and for the total N states we need

Page 14: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky

Hidden Markov Model – Viterbi algorithm

Page 14

Automated Speach Recognotion

• Previously: given an observation sequence (for example 10110), what is the probability it was generated from a given HMM

• We now ask: given an observation sequence , what is the sequence of states that it most likely to have generated it?

Page 15: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky

Hidden Markov Model – Viterbi algorithm

Page 15

Automated Speach Recognotion

• Define as the best path from the start state to state i at time t

• is our objective

• We solve this with the same forward algorithm as before, but this time with maximization instead of summation.

Page 16: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky

Hidden Markov Model – Viterbi algorithm

Page 16

Automated Speach Recognotion

Page 17: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky

Hidden Markov Model – Viterbi algorithm, example

Page 17

Automated Speach Recognotion

• Observations sequence - 101

Page 18: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky

Hidden Markov Model – model fitting

Page 18

Automated Speach Recognotion

• In practice, the parameters of the HMM are unknown

• We are interested in

• No analytical maximum likelihood solution. We turn to Baum-Welch algorithm or forward-backward algorithm.

• Basic idea – count the visits of each state and the number of transitions to derive a probability estimator

Page 19: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky

Hidden Markov Model – model fitting

Page 19

Automated Speach Recognotion

• Define:

This is the conditional probability that are observed, given that the system starts at state given the model . This can be calculated inductively:

• Define – the probability of being at the state i at time t and having observed the partial sequence :

Page 20: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky

Hidden Markov Model – model fitting

Page 20

Automated Speach Recognotion

• Define:

This is the conditional probability that are observed, given that the system starts at state given the model .

• Define – the probability of being at the state i at time t and having observed the partial sequence :

• Therefore, the probability of being in state i at time t, given the entire observations sequence and the model is simply:

Page 21: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky

Hidden Markov Model – model fitting

Page 21

Automated Speach Recognotion

• Define: the probability of being in state i at time t and state j at time t+1 given the model and the observations sequence:

• Graphically:

Page 22: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky

Hidden Markov Model – model fitting

Page 22

Automated Speach Recognotion

• We are now ready to introduce the parameters estimators:

Transition probability estimator:

The expected number of transitions from state i to j, normalized by the expected number of visits of state i

Page 23: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky

Hidden Markov Model – model fitting

Page 23

Automated Speach Recognotion

• We are now ready to introduce the parameters estimators:

Observations probability estimator:

The expected number times in state j at which the symbol was observed, normalized by the expected number of times the system visited state j

Page 24: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky

Hidden Markov Model – model fitting

Page 24

Automated Speach Recognotion

• Notice that the parameters we wish to estimate actually appear in both sides of the equation:

Page 25: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky

Hidden Markov Model – model fitting

Page 25

Automated Speach Recognotion

• Notice that the parameters we wish to estimate actually appear in both sides of the equation

• Therefore, we use an iterative procedure: after stating with an initial guess for the parameters we gradually update at each iteration and terminate once the parameters stop changing to a certain limit.

Page 26: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky

Hidden Markov Model – model fitting

Page 26

Automated Speach Recognotion

• For continuous observations

• We estimate the mean and variance for each state j:

Page 27: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky

Conclusions and final remarks

Page 27

Automated Speach Recognotion

• We learned how to:I. Estimate HMM parameters from a sequence of observationsII. Determine the probability of observing a sequence given an HMMIII. Determine the most likely sequence of states, given an HMM and a

sequence of observations

• Notice the states may either represent words, syllables, phoneme, etc. This is up for the system architect to decide

• For example, words are more informative than syllables, but results with more states and less accurate probability estimation (curse of dimensionality)

Page 28: Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky

Questions?

Page 28

Thank you!

Automated Speach Recognotion