Pei- ning Chen NTNU CSIE SLP Lab

DISCRIMINATIVE TRAINING BASED ON AN INTEGRATED VIEW OF MPE AND MMI IN

MARGIN AND ERROR SPACE Erik McDermott, Shinji Watanabe and Atsushi Nakamura

ICASSP 2010

Pei-ning ChenNTNU CSIE SLP Lab

Outline

• Introduction• Margin-based – MPE, MMI, and dMMI

• Macroscopic analysis using the error-indexed forward-backward algorithm

• Experimental results• Conclusions

Introduction

• It was shown that MPE or MPFE (Minimum Phone Frame Error) corresponds to the derivative of the margin-modified MMI objective function with respect to the margin term.

• A new framework, “differenced MMI” (dMMI), was proposed in which the objective function is an integral of MPE-style loss over a given margin interval.

Margin-based MPE

Rewrite the cost function in terms of pair-wise comparisons

Then the modified MPE loss can be expressed as

R

r

kkrk

n nrnrnMPE

rk

rn

eSXpSP

eSXpSPf

,

,

|

|,,

nrnkrkrnk SXpSPSXpSPXm |log|log,,

r n

nk

Xm

rnMPEnkrrnke

f ,, ,

,,

1

Margin-based MMI

• Using the same pair-wise comparisons

4|

|log

1,,

R

r

kkrk

rrrMMI

rkeSXpSP

SXpSPF

R

r rk

XmMMI rkrrkeF ,, ,

, 1log1

• It is easy to show that MPE (margin-based or not) is the derivative of margin-based MMI with respect to σ

Differenced MMI• It is defined in terms of an integral of MPE loss

over a given margin interval

Optimization based on dMMI• For a given arc q in a recognition lattice for

utterance Xr,

– where is the standard arc posterior probability or occupancy calculated with the Forward-Backward algorithm.

• The corresponding lattice arc occupancies are subtracted and divided by σ2 − σ1:

Optimization based on dMMI• The total gradient for all parameter

components Λi, summed over all training utterances and all Qr arcs in each utterance’s recognition lattice, can then be calculated

The error-indexed forward-backward algorithm

• An aggregate probability mass for all lattice strings with the same total error count j :

• The corresponding margin-modified error group occupancy is

• The standard (σ = 0) error group MPE derivative is

• The aggregated dMMI derivative is

• 454

Experimental results

Conclusion • A new approach for DT, “differenced MMI”.• Experiments confirmed that a close approximation

to MPE can be implemented using dMMI.• Aggregate error-group statistics show that the

choice of interval affects the relative weighting of different error levels during training.

• The proper choice of margin interval is a topic for future research.

Documents

Pei- ning Chen NTNU CSIE SLP Lab