Upload
uday-kumar
View
221
Download
0
Embed Size (px)
Citation preview
7/28/2019 Speech Processing2
1/21
Speech Recognition
7/28/2019 Speech Processing2
2/21
Speech Recognition
Simplest method -- Directly compare the
input signal and the reference signals
E = (Input speech) (Reference speeches)
and find the optimum reference speech
Difficulties:
Speeches are different from time to time.
Its impossible for a large database.
7/28/2019 Speech Processing2
3/21
Speech Recognition
Solve as pattern recognition problem:
Pattern
Matching
FeatureExtraction
Decision
Making
Feature
Extraction
Reference
PatternsReference
signal
Input
signal Feature
vector
sequence
Output
7/28/2019 Speech Processing2
4/21
Front-end
Signal Processing
Acoustic
Models Lexicon
Feature
VectorsLinguistic Decoding
and
Search Algorithm
Output
Sentence
Speech
Corpora
Acoustic
Model
Training
Language
Model
Construction
Text
Corpora
Lexical
Knowledge-base
Language
Model
Input Speech
Grammar
Basic Approach
A simplified block diagram --
7/28/2019 Speech Processing2
5/21
Front-end
Signal Processing
Acoustic
Models Lexicon
Feature
VectorsLinguistic Decoding
and
Search Algorithm
Output
Sentence
Speech
Corpora
Acoustic
Model
Training
Language
Model
Construction
Text
Corpora
Lexical
Knowledge-base
Language
Model
Input Speech
Grammar
Basic Approach
A simplified block diagram --
7/28/2019 Speech Processing2
6/21
Front-end
Signal Processing
Acoustic
Models Lexicon
Feature
VectorsLinguistic Decoding
and
Search Algorithm
Output
Sentence
Speech
Corpora
Acoustic
Model
Training
Language
Model
Construction
Text
Corpora
Lexical
Knowledge-base
Language
Model
Input Speech
Grammar
Block Diagram
7/28/2019 Speech Processing2
7/21
Front-end Signal Processing
End point detection (speech/silence
discrimination)
Noise reduction:
clean signal = (input) (environmental noise)
How to get environmental noise?
Simplest way: Let noise be the first 10 frames
7/28/2019 Speech Processing2
8/21
Front-end Signal Processing
Windowing
Windowing
DFT
Frames
7/28/2019 Speech Processing2
9/21
Front-end Signal Processing
Pre-emphasis of spectrum at higher
frequencies
Pre-emphasis --
Decay in higher frequency
7/28/2019 Speech Processing2
10/21
Front-end Signal Processing
Yt(1) Y
t(2) Y
t(M)
SUM
SUM
SUM
Mel-Filter
Bank
Feature vector
Spectrum
7/28/2019 Speech Processing2
11/21
Front-end
Signal Processing
Acoustic
Models Lexicon
Feature
VectorsLinguistic Decoding
and
Search Algorithm
Output
Sentence
Speech
Corpora
Acoustic
Model
Training
Language
Model
Construction
Text
Corpora
Lexical
Knowledge-base
Language
Model
Input Speech
Grammar
Block Diagram
7/28/2019 Speech Processing2
12/21
Acoustic Modeling -- HMMs
Hidden Markov Models (HMMs)
x hidden states
y observable outputs
a transition probabilities
b output probabilities
7/28/2019 Speech Processing2
13/21
Acoustic Modeling -- Unit Selection
Unit selection for HMMs --
phrases, words, syllables, phonemes
Phoneme -- the minimum units of speech
sound in a language which can serve to
distinguish one word from another
7/28/2019 Speech Processing2
14/21
Acoustic Modeling
Yt(1)
Yt(2)
Yt(M)
Yt+1
(1)
Yt+1
(2)
Yt+1
(M)
Yt+2
(1)
Yt+2
(2)
Yt+2
(M)
Yt+3
(1)
Yt+3
(2)
Yt+3
(M)
Yt+4
(1)
Yt+4
(2)
Yt+4
(M)
Observation Sequence
(Feature vectors)
7/28/2019 Speech Processing2
15/21
Acoustic Modeling
Find the probability of observing a sequence
o1
o2
o3 o 4 o 5 o 6 o 7 o 8
q1
q2
q3
q4
q5
q6
q7
q8
observation sequence
state sequence
b1( o ) b
2( o ) b
3( o )
2211
a13
a24
a23 states
a12
observation probability
7/28/2019 Speech Processing2
16/21
Front-end
Signal Processing
Acoustic
Models Lexicon
Feature
VectorsLinguistic Decoding
and
Search Algorithm
Output
Sentence
Speech
Corpora
Acoustic
Model
Training
Language
Model
Construction
Text
Corpora
Lexical
Knowledge-base
Language
Model
Input Speech
Grammar
Block Diagram
7/28/2019 Speech Processing2
17/21
Front-end
Signal Processing
Acoustic
Models Lexicon
Feature
VectorsLinguistic Decoding
and
Search Algorithm
Output
Sentence
Speech
Corpora
Acoustic
Model
Training
Language
Model
Construction
Text
Corpora
Lexical
Knowledge-base
Language
Model
Input Speech
Grammar
Block Diagram
7/28/2019 Speech Processing2
18/21
Language Modeling
N-gram -- Find the probability of a word
sequence W = (w1, w2, w3, , wi, , wR) :
N=1 : unigram P(wi)
N=2 : bigram P(wi| wi-1) N=3 : tri-gram P(wi| wi-2, wi-1) N=4 : four-gram P(wi| wi-3, wi-2, wi-1)
R
i
ii wwwwPwPWP2
1211 ),...,|()()(
7/28/2019 Speech Processing2
19/21
Language Modeling
Create a phrase lexicon
Make a word graph
Class-based language modeling
Keyword spotting
Smoothing
7/28/2019 Speech Processing2
20/21
Summary
Syllable Lattice
Word Graph
Acoustic
Models
Language
Models
w1 w2
w1 w2
Lexicon
P(w1)P(w2 |w1)......
P(w1)P(w2 |w1)......
t
7/28/2019 Speech Processing2
21/21
Small Example
Example input sequence
this is speech
Acoustic models
(th-ih-s-ih-z-s-p-ih-ch)
Lexicon (th-ih-s) this
(ih-z) is
(s-p-iy-ch) speech Language models (this) (is) (speech)
P(this) P(is | this) P(speech | this is)