Upload
skyla
View
41
Download
0
Embed Size (px)
DESCRIPTION
Discriminative Training Based On An Integrated View Of MPE And MMI In Margin And Error Space Erik McDermott, Shinji Watanabe and Atsushi Nakamura ICASSP 2010. Pei- ning Chen NTNU CSIE SLP Lab. Outline. Introduction Margin-based MPE, MMI, and dMMI - PowerPoint PPT Presentation
Citation preview
DISCRIMINATIVE TRAINING BASED ON AN INTEGRATED VIEW OF MPE AND MMI IN
MARGIN AND ERROR SPACE Erik McDermott, Shinji Watanabe and Atsushi Nakamura
ICASSP 2010
Pei-ning ChenNTNU CSIE SLP Lab
Outline
• Introduction• Margin-based – MPE, MMI, and dMMI
• Macroscopic analysis using the error-indexed forward-backward algorithm
• Experimental results• Conclusions
Introduction
• It was shown that MPE or MPFE (Minimum Phone Frame Error) corresponds to the derivative of the margin-modified MMI objective function with respect to the margin term.
• A new framework, “differenced MMI” (dMMI), was proposed in which the objective function is an integral of MPE-style loss over a given margin interval.
Margin-based MPE
Rewrite the cost function in terms of pair-wise comparisons
Then the modified MPE loss can be expressed as
R
r
kkrk
n nrnrnMPE
rk
rn
eSXpSP
eSXpSPf
,
,
|
|,,
nrnkrkrnk SXpSPSXpSPXm |log|log,,
r n
nk
Xm
rnMPEnkrrnke
f ,, ,
,,
1
Margin-based MMI
• Using the same pair-wise comparisons
4|
|log
1,,
R
r
kkrk
rrrMMI
rkeSXpSP
SXpSPF
R
r rk
XmMMI rkrrkeF ,, ,
, 1log1
• It is easy to show that MPE (margin-based or not) is the derivative of margin-based MMI with respect to σ
Differenced MMI• It is defined in terms of an integral of MPE loss
over a given margin interval
Optimization based on dMMI• For a given arc q in a recognition lattice for
utterance Xr,
– where is the standard arc posterior probability or occupancy calculated with the Forward-Backward algorithm.
• The corresponding lattice arc occupancies are subtracted and divided by σ2 − σ1:
Optimization based on dMMI• The total gradient for all parameter
components Λi, summed over all training utterances and all Qr arcs in each utterance’s recognition lattice, can then be calculated
The error-indexed forward-backward algorithm
• An aggregate probability mass for all lattice strings with the same total error count j :
• The corresponding margin-modified error group occupancy is
• The standard (σ = 0) error group MPE derivative is
• The aggregated dMMI derivative is
• 454
Experimental results
Conclusion • A new approach for DT, “differenced MMI”.• Experiments confirmed that a close approximation
to MPE can be implemented using dMMI.• Aggregate error-group statistics show that the
choice of interval affects the relative weighting of different error levels during training.
• The proper choice of margin interval is a topic for future research.