Upload
leon
View
64
Download
0
Tags:
Embed Size (px)
DESCRIPTION
DSP homework 1 HMM Training and Testing. Advisor: Lin-Shan Lee Date: 2007.04.10. Digit Recognizer. Construct a Digit Recognizer Free tools of HMM: HTK http://htk.eng.cam.ac.uk/ Training data, testing data, and some script. http://speech.ee.ntu.edu.tw/courses/DSP/. - PowerPoint PPT Presentation
Citation preview
DSP homework 1 HMM Training and Testing
Advisor: Lin-Shan Lee
Date: 2007.04.10
Digit Recognizer
Construct a Digit Recognizer
Free tools of HMM: HTK http://htk.eng.cam.ac.uk/
Training data, testing data, and some script. http://speech.ee.ntu.edu.tw/courses/DSP/
Digit Recognizer - Flowchart
Feature extraction
Initialize HMM
Re-estimate
Training wave data
Acoustic model
modify
Testing wave data
Viterbi search
Compare with answers
Training MFCC data
Testing MFCC data
accuracy
Training phase
Thanks for HTK!
We will focus on these HTK tools
configurationFeature extraction
HCopy
Initialize
HCompVRe-estimate
HERest
Training wave data
Acoustic model
Modify
HHEd
Testing wave data
Viterbi search
HVite
Compare with answers
HResult
Training MFCC data
Testing MFCC data
accuracy
Training phase
Construct word net
HParseRules and dictionary
HCopy - Feature extraction
HCopy -C lib/hcopy.cfg –S scripts/training_hcopy.scp Convert wave to 39 dimension MFCC -C lib/hcopy.cfg
Input and output formate.g. wav -> MFCC_Z_E_D_A
Parameters of feature extraction Chapter 7 - Speech Signals and Front-end Processing
–S scripts/training_hcopy.scp A mapping from Input file name to output file name
speechdata/training/N110022.wav MFCC/training/N110022.mfc
…
Initial MMF prototype
MMF: HTK chapter 7~o <VECSIZE> 39 <MFCC_Z_E_D_A>~h "proto"<BeginHMM><NumStates> 5<State> 2<Mean> 390.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 …<Variance> 391.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 …<State> 3<Mean> 130.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 …<Variance> 13…<TransP> 50.0 1.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.00.0 0.0 0.5 0.5 0.00.0 0.0 0.0 0.5 0.50.0 0.0 0.0 0.0 0.0<EndHMM>
1.00.5
0.50.5
0.50.5
0.5proto
mean=0 var=1
mean=0 var=1
mean=0 var=1
HCompV - Initialize
HCompV -C lib/config.fig -o hmmdef -M hmm -S scripts/training.scp lib/proto
Compute global mean and variance of features -C lib/config.fig
Set format of input feature. E.g. MFCC_Z_E_D_A
-o hmmdef -M hmm Set output name: hmm/hmmdef
-S scripts/training.scp A list of training data.
lib/proto A description of a HMM model. HTK MMF format
HCompV - Initialize
dim 1
All MFCC features
……
dim 39
… …
+
-C lib/config.fig
-o hmmdef -M hmm
-S scripts/training.scp
lib/proto
1.00.5
0.50.5
0.50.5
0.5proto
1.00.5
0.50.5
0.50.5
0.5hmmdef
Initial HMM
bin/macro Construct each HMM
bin/models_1mixsil Add silence HMM
These are written in C. But you can do these by a text editor.
1.00.5
0.50.5
0.50.5
0.5liN(零)
……
1.00.5
0.50.5
0.50.5
0.5Jiou(九)
1.00.6
0.40.6
0.40.7
0.3sil
hmm/models
1.00.5
0.50.5
0.50.5
0.5hmmdef
hmm/hmmdef
HERest - Adjust HMM model
HERest -C lib/config.cfg –S scripts/training.scp –I labels/Clean08TR.mlf -H hmm/macros -H hmm/models –M hmm lib/models.lst
Adjust parameters to maximize P(O|) one iteration of EM algorithm
Run this command three times => three iteration
–I labels/Clean08TR.mlf Set label file to “labels/Clean08TR.mlf”
lib/models.lst A list of word models. ( liN ( 零 ), #i ( 一 ), #er ( 二 ),… jiou ( 九 ), sil )
HERest - Adjust HMM model
Chapter 4 - basic problem 3 for HMM Give O and an initial model =(A,B,), adjust to
maximize P(O|)
sil liou (六) san (三) #er (二) sil
MFCC features
sil
Lable:
sil
liou san #er
HHEd – Modify HMMs
bin/spmodel_gen hmm/models hmm/models Add SP (short pause) HMM definition to MMF file “hmm/models”
HHEd -H hmm/macros -H hmm/models –M hmm
lib/sil1.hed lib/models_sp.lst
lib/sil1.hed A list of command to modify HMM definitions
lib/models_sp.lst A new list of model.
( liN ( 零 ), #i ( 一 ), #er ( 二 ),… jiou ( 九 ), sil, sp )
See HTK book 3.2.2 (p. 33)
sil
sp
HERest - Adjust HMM model againHERest -C lib/config.cfg –S scripts/training.scp
–I labels/Clean08TR_sp.mlf -H hmm/macros
-H hmm/models –M hmm lib/models_sp.lst
sil Liou (六) san (三) #er (二) sil
MFCC features
sil
Lable:
sil
liou san #er
sp sp
sp sp
HHEd - Increase numbers of mixtureHHEd -H hmm/macros -H hmm/models -M hmm lib/mix2_10.hed
lib/models_sp.lst
Increase numbers of mixture
HERest - Adjust HMM model againHERest -C lib/config.cfg –S scripts/training.scp
–I labels/Clean08TR_sp.mlf -H hmm/macros
-H hmm/models –M hmm lib/models_sp.lst
sil Liou (六) san (三) #er (二) sil
MFCC features
sil
Lable:
sil
liou san #er
sp sp
sp sp
Flowchart
configurationFeature extraction
HCopy
Initialize
HCompVRe-estimate
HERest
Training wave data
Acoustic model
Modify
HHEd
Testing wave data
Viterbi search
HVite
Compare with answers
HResult
Training MFCC data
Testing MFCC data
accuracy
Training phase
Construct word net
HParseRules and dictionary
HParse – construct word net
HParse lib/grammar_sp lib/wdnet_sp
lib/grammar_sp Regular expression
Lib/wdnet_sp Output Word net
……
sil
ling
yi
jiu
sp sil
HVite – Viterbi search
HVite -H hmm/macros -H hmm/models -S scripts/testing.scp -C lib/config.cfg -w lib/wdnet_sp -l '*' -i result/result.mlf -p 0.0 -s 0.0 lib/dict lib/models_sp.lst
Chapter 8.0 - Search Algorithms for Speech Recognition -w lib/wdnet_sp
Input word net
-i result/result.mlf Output MLF file
lib/dict Dictionary: a mapping from word to phone sequences E.g. ling -> liN, er -> #er, … . 一 -> sic_i i, 七 -> chi_i i
HResult – compare with answersHResults -e "???" sil -e "???" sp -I labels/answer.mlf lib/models_sp.lst
result/result.mlf
Longest Common Subsequence (LCS)
====================== HTK Results Analysis ==================== Date: Mon Apr 9 19:34:02 2007 Ref : labels/answer.mlf Rec : result/result.mlf------------------------ Overall Results --------------------------SENT: %Correct=91.88 [H=441, S=39, N=480]WORD: %Corr=98.56, Acc=97.41 [H=1713, D=17, S=8, I=20, N=1738]============================================================
Training flowchart
Initialize
HCompVAdd SP
HHEd
Training MFCC dataand Labels
MMF
HMM proto5 state
MMF
Silence
Re-estimate
HERest
Re-estimate
HERest
Re-estimate
HERest
Increase minxture
HHEd
HMM
x3
Requirement 1 – run baseline
Download homework package Set PATH for HTK tools execute (bash shell script):
01_run_HCopy.sh 02_run_HCompV.sh 03_training.sh 04_testing.sh
You can find acc in result/accuracy
Requirement 2– improve recognition
accuracy
Initialize
HCompVAdd SP
HHEd
Training MFCC dataand Labels
MMF
HMM protomore state
MMF
Silence
Re-estimate
HERest
Re-estimate
HERest
Re-estimate
HERest
Increase minxture
HHEd
HMM
x3
x3
x3
Original script is not correct!!
Increase number of states
Submit your homework
Request files: result/accuracy – the best acc you have done
hmm/models – your HMM models
Any script you wrote or modified
A document describe your training process and accuracy
Compress all files above into a zip file named “hw1_b9xxxxxx.zip” E-mail to TA with subject: [DSP] hw1_b9xxxxx <your name>
and your zip file
Deadline: 2007.4.25 0:00
Suggestion and other information Read HTK book chapter 3 first. (except 3.3, 3.5) Many hints are in the shell scripts. Perl or a good editor is your good friend. If you have problems about bash shell script, see
http://linux.vbird.org/linux_basic/0340bashshell-scripts.php If you want to run HTK under windows
Cygwin + HTK windows binary. If you have any question, mail to TA