Speech Processing2

7/28/2019 Speech Processing2

1/21

Speech Recognition


2/21

Speech Recognition

Simplest method -- Directly compare the

input signal and the reference signals

E = (Input speech) (Reference speeches)

and find the optimum reference speech

Difficulties:

Speeches are different from time to time.

Its impossible for a large database.


3/21

Speech Recognition

Solve as pattern recognition problem:

Pattern

Matching

FeatureExtraction

Decision

Making

Feature

Extraction

Reference

PatternsReference

signal

Input

signal Feature

vector

sequence

Output


4/21

Front-end

Signal Processing

Acoustic

Models Lexicon

Feature

VectorsLinguistic Decoding

and

Search Algorithm

Output

Sentence

Speech

Corpora

Acoustic

Model

Training

Language

Model

Construction

Text

Corpora

Lexical

Knowledge-base

Language

Model

Input Speech

Grammar

Basic Approach

A simplified block diagram --


5/21

Front-end

Signal Processing

Acoustic

Models Lexicon

Feature


and

Search Algorithm

Output

Sentence

Speech

Corpora

Acoustic

Model

Training

Language

Model

Construction

Text

Corpora

Lexical

Knowledge-base

Language

Model

Input Speech

Grammar

Basic Approach

A simplified block diagram --


6/21

Front-end

Signal Processing

Acoustic

Models Lexicon

Feature


and

Search Algorithm

Output

Sentence

Speech

Corpora

Acoustic

Model

Training

Language

Model

Construction

Text

Corpora

Lexical

Knowledge-base

Language

Model

Input Speech

Grammar

Block Diagram


7/21

Front-end Signal Processing

End point detection (speech/silence

discrimination)

Noise reduction:

clean signal = (input) (environmental noise)

How to get environmental noise?

Simplest way: Let noise be the first 10 frames


8/21


Windowing

Windowing

DFT

Frames


9/21


Pre-emphasis of spectrum at higher

frequencies

Pre-emphasis --

Decay in higher frequency


10/21


Yt(1) Y

t(2) Y

t(M)

SUM

SUM

SUM

Mel-Filter

Bank

Feature vector

Spectrum


11/21

Front-end

Signal Processing

Acoustic

Models Lexicon

Feature


and

Search Algorithm

Output

Sentence

Speech

Corpora

Acoustic

Model

Training

Language

Model

Construction

Text

Corpora

Lexical

Knowledge-base

Language

Model

Input Speech

Grammar

Block Diagram


12/21

Acoustic Modeling -- HMMs

Hidden Markov Models (HMMs)

x hidden states

y observable outputs

a transition probabilities

b output probabilities


13/21

Acoustic Modeling -- Unit Selection

Unit selection for HMMs --

phrases, words, syllables, phonemes

Phoneme -- the minimum units of speech

sound in a language which can serve to

distinguish one word from another


14/21

Acoustic Modeling

Yt(1)

Yt(2)

Yt(M)

Yt+1

(1)

Yt+1

(2)

Yt+1

(M)

Yt+2

(1)

Yt+2

(2)

Yt+2

(M)

Yt+3

(1)

Yt+3

(2)

Yt+3

(M)

Yt+4

(1)

Yt+4

(2)

Yt+4

(M)

Observation Sequence

(Feature vectors)


15/21

Acoustic Modeling

Find the probability of observing a sequence

o1

o2

o3 o 4 o 5 o 6 o 7 o 8

q1

q2

q3

q4

q5

q6

q7

q8

observation sequence

state sequence

b1( o ) b

2( o ) b

3( o )

2211

a13

a24

a23 states

a12

observation probability


16/21

Front-end

Signal Processing

Acoustic

Models Lexicon

Feature


and

Search Algorithm

Output

Sentence

Speech

Corpora

Acoustic

Model

Training

Language

Model

Construction

Text

Corpora

Lexical

Knowledge-base

Language

Model

Input Speech

Grammar

Block Diagram


17/21

Front-end

Signal Processing

Acoustic

Models Lexicon

Feature


and

Search Algorithm

Output

Sentence

Speech

Corpora

Acoustic

Model

Training

Language

Model

Construction

Text

Corpora

Lexical

Knowledge-base

Language

Model

Input Speech

Grammar

Block Diagram


18/21

Language Modeling

N-gram -- Find the probability of a word

sequence W = (w1, w2, w3, , wi, , wR) :

N=1 : unigram P(wi)

N=2 : bigram P(wi| wi-1) N=3 : tri-gram P(wi| wi-2, wi-1) N=4 : four-gram P(wi| wi-3, wi-2, wi-1)

R

i

ii wwwwPwPWP2

1211 ),...,|()()(


19/21

Language Modeling

Create a phrase lexicon

Make a word graph

Class-based language modeling

Keyword spotting

Smoothing


20/21

Summary

Syllable Lattice

Word Graph

Acoustic

Models

Language

Models

w1 w2

w1 w2

Lexicon

P(w1)P(w2 |w1)......

P(w1)P(w2 |w1)......

t


21/21

Small Example

Example input sequence

this is speech

Acoustic models

(th-ih-s-ih-z-s-p-ih-ch)

Lexicon (th-ih-s) this

(ih-z) is

(s-p-iy-ch) speech Language models (this) (is) (speech)

P(this) P(is | this) P(speech | this is)

Documents

Speech Processing2