Speech Processing2

Embed Size (px)

Citation preview

  • 7/28/2019 Speech Processing2

    1/21

    Speech Recognition

  • 7/28/2019 Speech Processing2

    2/21

    Speech Recognition

    Simplest method -- Directly compare the

    input signal and the reference signals

    E = (Input speech) (Reference speeches)

    and find the optimum reference speech

    Difficulties:

    Speeches are different from time to time.

    Its impossible for a large database.

  • 7/28/2019 Speech Processing2

    3/21

    Speech Recognition

    Solve as pattern recognition problem:

    Pattern

    Matching

    FeatureExtraction

    Decision

    Making

    Feature

    Extraction

    Reference

    PatternsReference

    signal

    Input

    signal Feature

    vector

    sequence

    Output

  • 7/28/2019 Speech Processing2

    4/21

    Front-end

    Signal Processing

    Acoustic

    Models Lexicon

    Feature

    VectorsLinguistic Decoding

    and

    Search Algorithm

    Output

    Sentence

    Speech

    Corpora

    Acoustic

    Model

    Training

    Language

    Model

    Construction

    Text

    Corpora

    Lexical

    Knowledge-base

    Language

    Model

    Input Speech

    Grammar

    Basic Approach

    A simplified block diagram --

  • 7/28/2019 Speech Processing2

    5/21

    Front-end

    Signal Processing

    Acoustic

    Models Lexicon

    Feature

    VectorsLinguistic Decoding

    and

    Search Algorithm

    Output

    Sentence

    Speech

    Corpora

    Acoustic

    Model

    Training

    Language

    Model

    Construction

    Text

    Corpora

    Lexical

    Knowledge-base

    Language

    Model

    Input Speech

    Grammar

    Basic Approach

    A simplified block diagram --

  • 7/28/2019 Speech Processing2

    6/21

    Front-end

    Signal Processing

    Acoustic

    Models Lexicon

    Feature

    VectorsLinguistic Decoding

    and

    Search Algorithm

    Output

    Sentence

    Speech

    Corpora

    Acoustic

    Model

    Training

    Language

    Model

    Construction

    Text

    Corpora

    Lexical

    Knowledge-base

    Language

    Model

    Input Speech

    Grammar

    Block Diagram

  • 7/28/2019 Speech Processing2

    7/21

    Front-end Signal Processing

    End point detection (speech/silence

    discrimination)

    Noise reduction:

    clean signal = (input) (environmental noise)

    How to get environmental noise?

    Simplest way: Let noise be the first 10 frames

  • 7/28/2019 Speech Processing2

    8/21

    Front-end Signal Processing

    Windowing

    Windowing

    DFT

    Frames

  • 7/28/2019 Speech Processing2

    9/21

    Front-end Signal Processing

    Pre-emphasis of spectrum at higher

    frequencies

    Pre-emphasis --

    Decay in higher frequency

  • 7/28/2019 Speech Processing2

    10/21

    Front-end Signal Processing

    Yt(1) Y

    t(2) Y

    t(M)

    SUM

    SUM

    SUM

    Mel-Filter

    Bank

    Feature vector

    Spectrum

  • 7/28/2019 Speech Processing2

    11/21

    Front-end

    Signal Processing

    Acoustic

    Models Lexicon

    Feature

    VectorsLinguistic Decoding

    and

    Search Algorithm

    Output

    Sentence

    Speech

    Corpora

    Acoustic

    Model

    Training

    Language

    Model

    Construction

    Text

    Corpora

    Lexical

    Knowledge-base

    Language

    Model

    Input Speech

    Grammar

    Block Diagram

  • 7/28/2019 Speech Processing2

    12/21

    Acoustic Modeling -- HMMs

    Hidden Markov Models (HMMs)

    x hidden states

    y observable outputs

    a transition probabilities

    b output probabilities

  • 7/28/2019 Speech Processing2

    13/21

    Acoustic Modeling -- Unit Selection

    Unit selection for HMMs --

    phrases, words, syllables, phonemes

    Phoneme -- the minimum units of speech

    sound in a language which can serve to

    distinguish one word from another

  • 7/28/2019 Speech Processing2

    14/21

    Acoustic Modeling

    Yt(1)

    Yt(2)

    Yt(M)

    Yt+1

    (1)

    Yt+1

    (2)

    Yt+1

    (M)

    Yt+2

    (1)

    Yt+2

    (2)

    Yt+2

    (M)

    Yt+3

    (1)

    Yt+3

    (2)

    Yt+3

    (M)

    Yt+4

    (1)

    Yt+4

    (2)

    Yt+4

    (M)

    Observation Sequence

    (Feature vectors)

  • 7/28/2019 Speech Processing2

    15/21

    Acoustic Modeling

    Find the probability of observing a sequence

    o1

    o2

    o3 o 4 o 5 o 6 o 7 o 8

    q1

    q2

    q3

    q4

    q5

    q6

    q7

    q8

    observation sequence

    state sequence

    b1( o ) b

    2( o ) b

    3( o )

    2211

    a13

    a24

    a23 states

    a12

    observation probability

  • 7/28/2019 Speech Processing2

    16/21

    Front-end

    Signal Processing

    Acoustic

    Models Lexicon

    Feature

    VectorsLinguistic Decoding

    and

    Search Algorithm

    Output

    Sentence

    Speech

    Corpora

    Acoustic

    Model

    Training

    Language

    Model

    Construction

    Text

    Corpora

    Lexical

    Knowledge-base

    Language

    Model

    Input Speech

    Grammar

    Block Diagram

  • 7/28/2019 Speech Processing2

    17/21

    Front-end

    Signal Processing

    Acoustic

    Models Lexicon

    Feature

    VectorsLinguistic Decoding

    and

    Search Algorithm

    Output

    Sentence

    Speech

    Corpora

    Acoustic

    Model

    Training

    Language

    Model

    Construction

    Text

    Corpora

    Lexical

    Knowledge-base

    Language

    Model

    Input Speech

    Grammar

    Block Diagram

  • 7/28/2019 Speech Processing2

    18/21

    Language Modeling

    N-gram -- Find the probability of a word

    sequence W = (w1, w2, w3, , wi, , wR) :

    N=1 : unigram P(wi)

    N=2 : bigram P(wi| wi-1) N=3 : tri-gram P(wi| wi-2, wi-1) N=4 : four-gram P(wi| wi-3, wi-2, wi-1)

    R

    i

    ii wwwwPwPWP2

    1211 ),...,|()()(

  • 7/28/2019 Speech Processing2

    19/21

    Language Modeling

    Create a phrase lexicon

    Make a word graph

    Class-based language modeling

    Keyword spotting

    Smoothing

  • 7/28/2019 Speech Processing2

    20/21

    Summary

    Syllable Lattice

    Word Graph

    Acoustic

    Models

    Language

    Models

    w1 w2

    w1 w2

    Lexicon

    P(w1)P(w2 |w1)......

    P(w1)P(w2 |w1)......

    t

  • 7/28/2019 Speech Processing2

    21/21

    Small Example

    Example input sequence

    this is speech

    Acoustic models

    (th-ih-s-ih-z-s-p-ih-ch)

    Lexicon (th-ih-s) this

    (ih-z) is

    (s-p-iy-ch) speech Language models (this) (is) (speech)

    P(this) P(is | this) P(speech | this is)