88
Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and Technology Copyright © by Arthur Chan 2001

Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Robust Speech Recognition Algorithm Against Unknown

Short-Time Noise

By Arthur Chan

Supervised by Prof. Manhung Siu

Hong Kong University of Science and Technology

Copyright © by Arthur Chan 2001

Page 2: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Outline• Robust Speech Recognition• HMM-based speech recognition in short-time noises.• Our Proposal : Skip the poor frames.

– Theory,– Implementation. FSVA and FSHMM

• Evaluation I : gaussian noise replacement• Improvement of FSVA• Evaluation II : Further evidences

– Additive short-time noise,– Short-time noise in GSM environment

• Conclusion and Future Work

Page 3: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Robust Speech Recognition

Page 4: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Speech Recognition

• Speech recognition– acceptable performance in matched

training and testing conditions.– Or the operating conditions is known in

training– Digit recognition (99%).– Dictation (90%).– Performance is still improving if the task is

under active research.

Page 5: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Mismatch Conditions

• The difference between training and operating (testing) enviroment.

• It exists.• For example,

– Simpler example• Sudden door slam when dictating a letter.

– In wireless environment,• The background of the speaker can change.

• Robust Speech Recognition is the study of building speech recognition that handle mismatch condition.

Page 6: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Mismatch Conditions (cont.)

• Why mismatch conditions are hard to deal with ? – There are so many causes of it.

• Additive noise (e.g. background noise such as air-conditioning)

• Channel noise (e.g. difference between microphones in training and testing conditions)

• Others : Lombard noise. Reflection of building.

– In general, noise can have• Random amplitude,• Random duration,• Random occurrence,• Random spectral characteristic.

Page 7: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Conventional Approach of Robust Speech Recognition

• E.g. Parallel Model Combination (PMC) (Gales, 95)– First collect some samples of noise in operating

environment,– Update acoustic model using the noise statistics,

• Work satisfactorily for stationary noise,• General time-varying noise cannot be

handled.

Page 8: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Short Time Noise

• Time limited Noise.

• Usually in operating environment, such as,– Door slam,– Click sound of keyboard,– Frame loss in network transmission of

speech.

Page 9: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Short Time Noise (cont.)

• In this work, we define short-time noise as,– Random spectral characteristic,– Random amplitude,– Random occurrence,– Random duration,– Shorter than the speech signal.

• Also known as partially temporal corruption (J. Ming, 2001).

• Some parts of speech is not corrupted.

Page 10: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

This work

• Deal with short-time noise.• Some parts of speech is uncorrupted.• Using an interesting perspective,

– Can we ignore contributions of those corrupted frames in the decision making process?

– Supported by Missing Feature Theory. (Lipmann 97)

• We can regard those corrupted parts of speech as missing.• We can ignore those missing parts in decision.

Page 11: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

HMM-Based Speech Recognition

Page 12: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Hidden Markov Model (HMM)

• Markov model with unobservable states sequence,• Can be used in other pattern recognition task.• Efficient algorithm for training and testing exists.• Example : Left-to-right HMM to model speech.

Page 13: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Viterbi Algorithm

• Efficiently search for most likely state sequence explains all observations.

)|(logmaxarg~

*OQPQ

QQ

• : An observation sequence, or .• : A state sequence, or .• : The set of all possible state sequence• : Best state sequence

OQ

*Q

Q~

),....,,( 21 Tooo

),....,,( 21 Tqqq

TO1TQ1

Page 14: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Viterbi Algorithm (cont.)

– Express in HMM’s parameters,

1

1 11

111

)(logloglogmaxarg

)()|(logmaxarg~

1*

*

T

tt

T

tqtqqq

QQ

TTT

QQ

oba

QPQOPQ

tt

Transition. Probability

Observation

Probability

Initial Probability

Page 15: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Viterbi Algorithm (cont.)

1

2

3

1

2

3

1

2

3

……….

T=1 T=2 T=3

• Efficient Implementation– At each state , at each time, define partial score,

),|(max)( 1111

1

iqQOPi ttt

Qt t

)(])(max[)( 1 tjijti

t obaij • Recursive Formula

Page 16: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Short-Time Noise in Viterbi Algorithm

)(loglog)|(11

111 t

T

tq

T

ttqtq

TT obaOQPt

• Finding the best state sequence,

• Finding the mean using the average,

N

nnxN 1

1 –E.g. Mean of 2.2,2.3,2.4,2.2 =2.275

– Mean of 2.2, 2.3, 2.4,100=26.275

• Easily affected by outlier frames.

Page 17: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Our Proposal

Page 18: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Our Proposal

• Search for most-likely state-sequence that ignores the most poorly performing K frames.

• Can be implemented efficiently– similar to Viterbi algorithm

• achieve satisfactory performance.

Robust Mean of 2.2, 2.3, 2.4,100

=(2.2+2.3+2.4)/3=2.3

Page 19: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Formulation : Ignore the poorest frame

• Try to ignore the frame with lowest likelihood. I.e.

1

11

)(maxarg~

1

1*

T

tt

ttqtqqq

QQobaQ

tt

• we have ranked order the frames in ).....( 1 Too to ).....(

1 Ttt oo

• Such that )()(11

iitiittqtq obob

Page 20: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Generalization : Ignore the poorest K frames

• The robust likelihood, is defined which skip the frames with lowest likelihood

),....{ 1

1,

1

1,

1

11

1

log)(log

log)(log

)|(

K

iit

it

tti

T

i

iitq

T

i

ii

T

Kiitq

TTK

aob

aob

OQ

-Still, we maintain the alignment information (transition term unchanged)

Page 21: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Generalization :

• Speech Recognition become the problem of finding a state sequence with best robust likelihood,

)|(maxarg

~111 *

TTK

QQ

T OQQ

Page 22: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Alternative Formulation• For every state sequence, consider all possible patterns of corruption of K frames among T frames.• Totally of them. Denote them as .• For each pattern, are the set of uncorrupted frames in this pattern • Pattern of corruption . E.g. of T=4, K=2 has following patterns of corruption.

– Frames 1 and 2,– Frames 1 and 3,– Frames 1 and 4,– Frames 2 and 3,– Frames 2 and 4,– Frames 3 and 4.

TKC ),......,,( 21 T

KClll

),......,,( 21 KTiiii oool

Page 23: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Alternative Formulation

• The robust likelihood, can be alternatively defined as,

1,

111

1,

11

11

log)|(logmax

log)|(logmax)|(

jj

T

j

ii

KT

j

C

i

jj

T

j

i

C

i

TTK

aqop

aQlpOQ

jj

TK

TK

• Extended Union Model probability (J. Ming)

1,

1111 log)|(log)|(

jj

T

j

i

C

i

TTK aQlpOQ

TK

Page 24: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Missing Feature Theory Interpretation

• The above formulation relates to Missing Feature Theory that suggests:– If a feature is corrupted, we can just ignore If a feature is corrupted, we can just ignore

itit– Example: Multi-band ASR assumes band

limited noise (frequency limited)– Similarly : Our Idea assumes noises are

short time in nature(time limited)

Page 25: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Direct Implementation

• Exhaustively neglect K frames for every state sequence– Very expensive,– For each state sequence, additions

are required,– Intractable for useful value of T and K

)( KTCT

K

Page 26: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Previous Attempts to tackle the Computation Burden

• Lets look at attempt deals with EUM• J.Ming et al (2001)

– N-Best re-scoring paradigm– An approximate model based on segment

(consecutive number of frames) is used.– Corruption in few frames is also regarded

as corruption of a whole segment.

• A more efficient algorithm is desirable.

Page 27: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Efficient Implementation of Viterbi Algorithm

that skips frames• Two approaches

– Topological-space expansion approach• using FSHMM.• using terminology similar to HMM.

– State-space expansion approach• Modify Viterbi algorithm directly.

Page 28: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Topological Space Expansion

• Frame-Skipping HMM (FSHMM)• Skipping state

– Consume one observation vector.– Generate a constant only.– Example:

1

Non-Skipping Version1

1

1_s

Skip State

Frame-Skipping Version

Page 29: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Left-to-right HMM (FS version)

Skipping State

NonSkip state

Skipt state

Page 30: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Implementation of TopologicalSpace Expansion Approach

• Memory usage (2N+1) times of Viterbi algorithm.

• Can be implemented with standard HMM software(e.g. HTK).

• Hard to be generalized to Continuous Word Recognition– A huge HMM need to be constructed

Page 31: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

State-Space Expansion approach

• The general idea– Augment K scores when skip K frames.– In updates from previous skips, we ignore the

contribution of observation probability.– E.g.

Non-skipping version

1 2

3

1_0

1_1

2_0

2_1

3_0

3_1skipping version

ija

)( tjij oba

Page 32: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Update Formula

• We can prove the recursion for partial robust likelihood.• We can define the partial score (robust likelihood) of state j at time T with skips K as

))](),(),1,(max([max

)),(),,(max(

),(

11

11

jjttiji

skipnont

skipt

t

obkikia

kjkj

kj

Page 33: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Proof of Update Formula

))|(max),|(maxmax(

)|(max

)|(

11

11

|11

21

t

Ll

t

Ll

ti

C

i

ttk

QlpQlp

Qlp

QOtk

– are the set of corruption where the k-th frame is skipped

– are the set of corruption where the k-th not skipped

1L

2L

Page 34: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Proof of Update Formula (cont.)

))|(),()|(max(

))|(max),|(maxmax()|(

|11

111

|11

11

11|

1121

ttktq

ttk

t

Ll

t

Ll

ttk

QOobQO

QlpQlpQO

t

–If we check the cardinality (or size) of the two sets.

||||||

||

||,||

2121

,21

112

11

LLLL

CLL

CLCLtk

tk

tk

Pascal’s formula

Page 35: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Frame Skipping Viterbi Algorithm (FSVA)

• Transition probability can be easily incorporated in the above formula

• above update formula is called FSVA.

• Similar idea can be used to compute the probability of extended union model (EUM).

Page 36: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

FSVA (cont.)

• Update Formula

))](),(),1,(max([max

)),(),,(max(

),(

11

11

jjttiji

nt

st

t

obkikia

kjkj

kj

Updated from

Skip k

Updated from

Skip k-1 e.g

Impatient Button

Page 37: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Implementation II (State-Space expansion approach)

• similar to exact N-Best Algorithm,

• Memory usage: N Times normal Viterbi,

• With caching of observation probabilities, computation will be quite similar to normal Viterbi .

Page 38: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Evaluation I:Gaussian Noise Replacement

Page 39: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Evaluation I(Objective)

• To determine the usefulness of FSVA.

Page 40: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Evaluation I(Conditions)

• Baseline– Corpus : TIDIGITS(adults) train 8668, test

8668– Training 12 MFCCs + delta +delta delta

+energy = 39 features– Testing results

• 99.72 (Isolated Digit Recognition),• 98.90 (Connected Digit Recognition) (Un-

tuned)

Page 41: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Evaluation I(Conditions)(cont.)

• Corruption is simulated– 10% of frames in testing utterance is

skipped and replaced by a frame , which is• gaussian noise• Constant energy level

– A clean model is used to test – Testing results using left-to-right HMM

• 85.34%(Isolated Digit Recognition), • 78.83%(Continuous Digit Recognition)

Page 42: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Experiment I(Results)

• Using FSVA

But : We are not happy!

-Degrade in clean speech.

-Hard to determine what is best skip if the condition is unknown

Acc Skip

CDR Clean 98.97 2

CDR Noisy 93.71 28

IDR Clean 98.47 20

IDR Noisy 99.76 2

70

75

80

85

90

95

100

1 7 13 19 25 31 37 43 49 55

CDR noisy(0.1)

CDR clean

IDR noisy (0.1)

IDR clean

IDR: +88%

CDR: +70%

Page 43: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

How much corrupted frames are skipped? -An Analysis

• Define – : All Frames.– : Set of corrupted frames.– : Set of uncorrupted frames.– : Set of detected frame or hit frames.

• Then likelihood ratio is found to be

• We skip mostly corrupted frames.

ACU CA /H

10)|(

)|(

UHP

CHP

Page 44: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

How much can be gained from FSVA? – 2nd Analysis

• Performance of FSVA using skips which gives lowest WER for each sentence– 99.72 (Isolated), 97.66 (Continuous)

• Still room for improvement– Longer sentences require more skips to recover

• E.g (Observed from data)111.wav

-SIL 1 1 SIL (from skip 1 to 5)

-SIL 1 1 1 SIL(from skip 6 to 29)

-SIL 3 1 1 SIL(from skip 29 to 57)

….

24z982z.wav

-SIL 2 z o 9 8 2 o SIL (from skip 1 to 4)

-SIL 2 4 z o 9 8 2 o SIL(from skip 5 to 22)

-SIL 2 4 z o 9 8 2 z o SIL(from skip 23 to 36)

-SIL 2 4 z o 9 8 2 z SIL (from skip 37 to 57)

….

Page 45: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Observations from Evaluation I

• It is difficult to determine the number of skips because of two factors,– The condition is unknown (rate of

corruption).– The length of sentence is unknown,

• Memory issue : N-times of standard Viterbi algorithm

Page 46: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Improvements of FSVA

Page 47: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Improvements of FSVA :

• We present the solutions of the skip determination problem,– Skip determination

• An automatic skip determination mechanism is presented.

– Memory problem is related to skip determination

• An approximate algorithm is presented• Preliminary result is presented.

Page 48: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Improvements of FSVA:Automatic Skip Determination

• This is hard problem, depends on– Length of utterance– Rate of corruption

• In known corruption rate and length of corruption– skipping fixed number of frames may be the most

intuitive.

• In general, these conditions are unknown– Ideally, we seek for method requires no prior

knowledge of the environment.

Page 49: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Improvements of FSVA:Automatic Skip Determination

(cont.)• Idea (Log Likelihood Ratio Thresholding

(LLRT))– Stop the skipping process by testing the ratio of

likelihood.

• Why does it work?– In general, the robust likelihood is increasing

against K.

– Because, we decimate one more frame contribution in criterion function

))1(~

|())(~

|( 1 KQOKQO KK

))(~

|())1(~

|(1 KQOKQO KK

Page 50: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Improvements of FSVA:Automatic Skip Determination

(cont.)• The improvement

– A likelihood ratio – Generally decreasing

• It suggests we can stop skipping if the ratio > certain threshold c

Page 51: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Cont.• Can be done very efficiently

– We can easily generate multi solutions.

Non-Skipping Version

Skipping Version

Start backtracking here

Page 52: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Evaluation of LLRT in gaussian noise replacement

• It works.– Undegraded in clean

condition– Improved in noisy

condition– Single value works for all

conditions. E.g. c=90

BL LLRT

Clean 98.90 98.98

Noisy 78.33 95.61

Page 53: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Discussion

• In LLRT, the threshold c– Effectively means the minimum likelihood of the

clean frames.– Success in LLRT suggests

• Skipping frames with likelihood smaller than c.• Simplified Frame-skipping Viterbi algorithm (SFSVA)• Update formula can be expressed as

, if else.

cobobct

ttob )()({)(ˆ

Page 54: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Simplified FSVA : Preliminary Evaluation

• At c=90

BL FSVA+

LLRT

SFSVA

Clean 98.90 98.98 98.86

Noisy 78.33 95.61 95.61

• Comparable Performance as FSVA+LLRT.

Page 55: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Evaluation II : Further Evidences

Page 56: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Evaluation II

• Previous Experiment in Evaluation I– Fixed spectral content (gaussian noise)– Fixed amplitude– Fixed duration ( 1 frame)– Replacement noise– Not general enough.

• Experiment 1 : additive short-time noise– With varying spectral content, amplitude, duration and

occurrence. • Experiment 2 : GSM environment (replacement noise)

– Replacement with comfort noise– Similar to speech in this case.

Page 57: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Experiment 1 (Setup)

• Train set is the same as Evaluation I• Additive short-time noise.

– Randomly pick up frames from 7 types of noises such as ring-tone, ICQ message.

– Controlled by 3 factors,• Amplitude (SNR),• Duration (L),• Rate of corruption (C).

• FSVA + LLRT is used in evaluation.

Page 58: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Experiment 1 (Results)

• Changing amplitudes, C=20%, L=1

SNR BL LLRT(opt.)

98.90 98.99(102)

10 98.62 98.67(106)

0 97.57 97.99(102)

-10 84.04 91.89(94)

Page 59: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Experiment 1 (Results) (cont.)

• Changing rate of corruptions, SNR=-10dB, L=1

Rate BL LLRT(opt.)

20% 84.04 91.89(94)

30% 69.21 82.96(94)

40% 56.39 71.15(94)

Page 60: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Experiment 1 (Results) (cont.)

• Changing length of corruptions. SNR=-10dB, C=20%

Length BL LLRT(opt.)

1 84.04 91.89(94)

2 87.33 91.99(100)

3 90.24 94.07(94)

4 93.30 96.22(92)

5 95.17 97.09(94)

6 95.69 97.31(92)

Page 61: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Experiment 1 (Results) (cont.)

• Average performance.

• Outperform baseline in wide range of c

• In [90,100]– Close to optimal

performance.

Page 62: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Experiment 1 (Summary of Results)

• FSVA + LLRT works in all conditions,– Undegraded result in SNR >0dB– Outperforms Viterbi algorithm in other cases

• Does it necessary to use the optimal threshold?– No.– A large range of values of c outperforms Viterbi

algorithm– A large range of values of c can be used such that,

• Closed to optimal result• Tuning in single condition only.

Page 63: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Experiment 2 (Comfort Noise Generation)

• GSM codec (GSM 06.10)– Regular Pulse Excited – Long Term Prediction (RPE-

LTP)– Linear Predictive Analysis and Synthesis

• Residual coefficients is important• Comfort Noise Generation (GSM 06.11)

– 1st frame : replace from last good frame– 2nd frame to 16th frame : decrease the magnitude of

residual coefficients of 1st frame– 16th + frames : predefined “silence” frame is substituted

• The generator cannot deal with frame loss with long duration.

Page 64: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Experiment 2 (Setup)

• Using AURORA database.– Down-sampled version of TIDIGITS.– 8008 training utterances.– 4004 testing utterances.

• Baseline result– Train(GSM coded) on Test (GSM coded),

98.64% (<98.90%)

Page 65: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Experiment 2 (Frame Loss Condition)

• Experiment in Noisy condition– 1%~2% of frames are corrupted– All skip position are known for the comfort

noise generator.– Comfort noise generation is done before

speech recognition.– 2 factors is controlled

• Rate of corruption• Length of corruption

Page 66: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Experiment 2 (Results)

D C BL LLRT(opt.)

1 1% 98.03 98.12(104)

2 1% 96.47 97.40(104)

3 1% 96.10 97.19(98)

1 2% 97.71 97.95(106)

1 5% 96.31 97.20(98)

1 10% 92.98 95.33(98)

Page 67: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Experiment 2 (Average Performance)

Page 68: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Experiment 2 (Summary of Results)

• Corruptions with 1 frame can be handled by comfort noise generator

• FSVA still has market value– When length of corruption > 1– When rate of corruption increase– After all, no degradation even in D=1

Page 69: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Conclusion and Future Work

Page 70: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Contribution of this work• FSVA – Frame Skipping Viterbi Algorithm

– found to be theoretically interesting– can be easily and efficiently implemented– good results in simulated noise

• Search technique can be applied in fast computation of Extended Union Model(EUM).

• LLRT – Log Likelihood Ratio Thresholding– Automatically determine no. of skips for FSVA.

• Preliminary study of SFSVA – simplified FSVA– Same amount of memory as Viterbi algorithm– Comparable improvement as FSVA + LLRT

Page 71: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Impact of this work

• HMM has wide range of applications in pattern recognition, digital communication.– FSVA can be used to deal with time-limited

(or space-limited) corruption in these applications

Page 72: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Future Work• Other possibilities implied from MFT.

– Don’t ignore, but impute.– When should we ignore a frame? When should we

impute it?• Combination of FSVA and Model-compensation technique

– Deal with general additive noise• Automatic Skip Determination : Any other combination

schemes?– E.g. Rover w/ confidence and voting?

• Evaluation in comfort noise generator of other codec.– E.g. Voice Over IP (VoIP)

• Extend FSVA to applications which applied HMM.

Page 73: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Thanks for your patience !

Page 74: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Q & A

Page 75: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

3, Have you tried your algorithm in Aurora?

• Yes! We tried on AURORA II• But, FSVA doesn’t work because

– Most of the noise are additive noise• E.g. Street noise• E.g Babble noise

– The database is designed for Feature Extraction

Page 76: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

4, It is hard to get clean speech corpus. How do you solve this?

• Our paradigm assume– Train in clean speech– Test in noisy speech

• A complementary method (Not yet succeed)– Train in noisy speech– Test in clean speech– Difficult because multiple mixture paradigm is hard

to beat.

Page 77: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

5, Can we incorporate burst corruption in FSVA?

• It is possible but not elegant.

Burst skip

stateSkip state

Page 78: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

6, Relation between Noise Composition?

• Not yet thoroughly understand

• Decompose FSHMM will result

Page 79: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

7, How about Null Node?

• This is a little bit tricky.• Skip state is a real state.• Null state cannot result in skipping of a frame,

– Because no frame is consumed!

1_0

1_1

2_0

2_1

3_0

3_1skipping version

Null Nodes

Page 80: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

8, Have you consider any real-life examples of additive noise?

• Yes!• Not presented in thesis and presentation.• We have tested on machine gun noise in NOISEX-92.• Results : 0.7% absolute gain or no gain• Cause: machine gun noise in NOISEX-92 corrupts all speech

frames. – Better to regard it as additive– Recording is done when the man is continuously shoot for

several minutes. (Can this be real?)– Positive result was obtained if the additive noise component

is removed.– Not reported because it may not be easily accepted by the

community.

Page 81: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

10, Examples of extending this idea to other applications?• Yes.

Page 82: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

11, How could this idea can be used in convolution

coding?

Page 83: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

12, What is your plan on combining the other

techniques with FSVA?

Page 84: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

13, Do you really think short-time noise should always

shorter than speech?• There is an intrinsic difficulty to define short-

time noise.• Dictionary of technology always characterize

short-time noise as– Random spectral content,– Random amplitude,– Random occurrence.

• No characterization in terms of length.• The length of speech may be the basic norm

for the length of noise.

Page 85: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

14, How do you compare this with other similar techniques?• As we have mentioned,

– There is another technique called EUM search.

Page 86: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

15, Actually, what makes FSVA works?

• Sorry!

• This is a problem we do not thoroughly understand

• Some strange results we obtained

• Hypothesis: partially corrupted frames.

Page 87: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

17, Why do you keep the transition probability in your

formulation?• In theory, we can also ignore the

transition contribution. However,– Changing the transition means breaking

the word apart.– It would be disastrous if a phone is deleted

or distorted.

Page 88: Robust Speech Recognition Algorithm Against Unknown Short-Time Noise By Arthur Chan Supervised by Prof. Manhung Siu Hong Kong University of Science and

Topological Expansion of SFSHMM