41
A Simple Example of generating a recognition System using HTK For a Word Level Recognition Yoon-Joong Kim Hanbat National University

A Simple Example of generating a recognition System using … · 452 words 7set –Training Data 452 ... • Features : LPC,MFCC,PLP • Wave file formats: HTK, Esignal, ... • -o

Embed Size (px)

Citation preview

A Simple Example of generating a recognition System using HTK

For a Word Level Recognition

Yoon-Joong Kim Hanbat National University

Brief Structure of HTK : HTKTools

wav

mfc

hmmdefs

Setting an environment variable(path)

• Download HTKD32.zip and save HTK 3.2 on

– location: d:\HTK32\binwin32 //HTK library \Data //speech Data

• Set the environment variable(path)

– 제어판\시스템 및 보안\시스템\고급시스템설정

– 시스템설정

• 고급

– 환경변수

» Adiminstrator (사용자계정)에 대한 사용자변수 변수 : path 값 : D:\HTK32\binwin32

» 확인

A folder structure for an example

• D:/HTK32 – Binwin32 – Data

• Talker1 – PBW1 Pbw1001.wav

Pbw1001.mfc Pbw1002.wav

Pbw1002.mfc ….

– PBW2 Pbw1001.wav Pbw1001.mfc Pbw1002.wav Pbw1002.mfc ….

• Talker2 – PBW1 Pbw1001.wav

Pbw1001.mfc Pbw1002.wav

Pbw1002.mfc ….

– PBW2

– YJK • config

– YJK

• configs Hcopy.config config

• scripts Hcopy.scp train.scp test.scp

• wordhmms – m0

proto vfloor Hmmdefs

– m1 Hmmdefs

– m2 …

• mlfs words.mlf recOutWordm5.mlf

• modelList wordList

• dic pbwGram pbwNet dict

Summary

• Data Preparation

– Preparation of speech data(wave file) for training and testing.

• Feature Vectors Generation (Hcopy.exe)

– Wave file =>mfcc file

• Generation of initial hmmdefs(ptototype, master macro files)

– General Hmm model(prototype) definition for a unit

– Valiance computation (HCompV.exe) for all the speech data

– Generation of hmmdefs(a set of initial Hmms, Master Macro File) for all units(word or phoneme) using the general Hmm model.

• Reestimate hmmdefs(Herest.exe)

• Recognition Test (Hvite.exe)

• Analyze Result (Hresults.exe)

Data Preparation

• Speech data for training and testing

– Data(wave file) 452 words 7set

– Training Data 452 words 7set

– Test Data

• Open Test(speaker indepedent recognition) : 0 set

• Close Test(speaker dependent recognition) : 7 set

– location: HTK32/Data/speaker name/PBW1/speech file

Data Preparation

• Feature of Speech Data

– NIST format

• NIST format that has a extension of “.wav”.

• 16KHz, 16bit, linear PCM

• Phonetically Balanced Words(PBW)

pbw1001.wav ⇒ “청와대” pbw1002.wav ⇒ “컴퓨터” Pbw1003.wav ⇒ “그에게” pbw1004.wav ⇒ “위대한” pbw1005.wav ⇒ “당뇨병” pbw1006.wav ⇒ “그야말로” pbw1007.wav ⇒ “예컨대” pbw1008.wav ⇒ “분야에서”

Feature Vector Generation

• Compute Feature Vectors Hcopy.exe

• Compute the features from wave file and save the features on the same folder.

• Features : LPC,MFCC,PLP

• Wave file formats: HTK, Esignal, TIMIT, NIST Sphere, SCRIBE, SDES1, AIFF,SUNAU8,OGI,WAV,NPHEAD

– -C configs/Hcopy.config

• Configuration file to compute features

– -S scripts/Hcopy.scp

• Script file of a list

• pairs of wave file and feature file

D:/HTK32/binwin32 /Data/speakerName/PBW1/ Pbw1001.wav Pbw1002.wav … /configs Hcopy.config /scripts Hcopy.scp => D:/HTK32/binwin32 /Data/SpeakerName/PBW1/ Pbw1001.wav Pbw1001.mfc Pbw1002.wav Pbw1002.mfc …

HCopy –C configs/Hcopy.config –S scripts/Hcopy.scp

Feature Vector Generation

• Compute Feature Vectors Hcopy.exe

• A copy program that extracts the features from wave file and save the features

• Features : LPC,MFCC,PLP

• Wave file formats: HTK, Esignal, TIMIT, NIST Sphere, SCRIBE, SDES1, AIFF,SUNAU8,OGI,WAV,NPHEAD

– -C configs/Hcopy.config

• Configuration file to extract features

– -S scripts/Hcopy.scp

• Script file of a list

• pairs of wave file and feature file

D:/HTK32/binwin32 /Data/speakerName/PBW1/ Pbw1001.wav Pbw1002.wav … /configs Hcopy.config /scripts Hcopy.scp => D:/HTK32/binwin32 /Data/SpeakerName/PBW1/ Pbw1001.wav Pbw1001.mfc Pbw1002.wav Pbw1002.mfc …

HCopy –C configs/Hcopy.config –S scripts/Hcopy.scp

Feature Vector Generation

• configuration file Configs/Hcopy.config

#Coding parameters SOURCEKIND = WAVEFORM SOURCEFORMAT = NIST //NIST format SOURCERATE = 625 //fs=16000KHz, 0.0625ms=1/16000 TARGETKIND = MFCC_0 //MFCC+ Energy C0

TARGETRATE = 100000.0 //window shift rate :10ms, of 100ns SAVECOMPRESSED = T SAVEWITHCRC = T WINDOWSIZE = 250000.0 //window size 25ms USEHAMMING = T //hamming window PREEMCOEF = 0.97 //pre-emphasis factor NUMCHANS = 26 //number of filter banks CEPLIFTER = 22 //filtering degree NUMCEPS = 12 //cepstrum order ENORMALISE = F //energy normalization

HCopy –C configs/Hcopy.config –S scripts/Hcopy.scp

Feature Vector Generation

• Script file Scripts/Hcopy.scp ../Data/MBTG0/pbw1/pbw1001.wav ../Data/MBTG0/pbw1/pbw1001.mfc ../Data/MBTG0/pbw1/pbw1002.wav ../Data/MBTG0/pbw1/pbw1002.mfc ………….

HCopy –C configs/Hcopy.config –S scripts/Hcopy.scp

Data Preparation

• Master Label File – word level transcription mlfs/words.mlf

– pbw1001.wav : “*/pbw1001.lab” sil 청와대 sil.

– pbw2001.wav : “*/pbw2001.lab” sil 청와대 sil. => “*/pbw*001.lab” sil 청와대 sil.

#!MLF!#

“*/pbw1001.lab” //Label index ::[wave file].lab

Sil //silence

청와대 //utterance

sil

. //end of label

#*/pbw2001.lab” //wild card is available sil 청와대

Sil

.

#!MLF!# “*/pbw1001.lab” sil 청와대 sil . “*/pbw2001.lab” sil 청와대 sil . “*/pbw1002.lab” sil 컴퓨터 sil . “*/pbw2002.lab” sil 컴퓨터 Sil . ...

#!MLF!# “*/pbw*001.lab” sil 청와대 sil . “*/pbw*002.lab” sil 컴퓨터 sil . ..

Data Preparation

• Model List

– modelList/wordList

– Hmm model name list

sil 청와대 컴퓨터 그에게 위대한 당뇨병 그야말로 예컨대 분야에서 어두운 소프트웨어 됐습니다 아니냐는 야당의

Generation of an initial Hmmdefs(master macro file)

• Scan a set of data files(train.scp), compute global mean and variance and set

them to new hmm prototype(m0/proto) from hmm proto(proto) and mean and variance.

• HCompV.exe • input :

-C configs/config //parameters for computing feature -f 0.01 //cause a variance floor macro (called vFooors) to be // computed with value of 0.01 times the global variance -m //cause mean to be computed as well as the variances -S scripts/train.scp //mfc feature vector list to be used in training WordHmms/proto //hanwritten hmm prototype

• output : -M WordHmms/m0 // directory for the result //vfloors : variance floor macro //proto : hmm prototype with valued GMM //hmmdefs : will be written manually with proto

HCompV –C configs/config –f 0.01 –m –S scripts/train.scp –M WordHmms/m0/ WordHmms/proto

Generation of an initial Hmmdefs(master macro file)

• Write a general Hmm model(ptototype) for mono phone speech WordHmms/proto – 3 state left-to-right Model

– WordHmms/proto • -o Feature vector definition

size : 39, MFCC_0_D_A

• -h Hmm model Name : proto

• State Number :5

• Gaussian Mixture Model one model, Mean, Valiance

• Transition Probability matrix

HCompV –C configs/config –f 0.01 –m –S scripts/train.scp –M WordHmms/m0/ WordHmms/proto

Generation of an initial Hmmdefs(master macro file)

• Configs/config

– script/Hcopy.config => configs/config

• TARGETKIND : MFCC_0_D_A

# Coding parameters NONUMSCAPES = T //for Korean handling TARGETKIND = MFCC_0_D_A //MFCC 12 + Energy 1 TARGETRATE = 100000.0 // + delta 13 + acceleration 13=39 SAVECOMPRESSED = T // shown later SAVEWITHCRC = T WINDOWSIZE = 250000.0 USEHAMMING = T PREEMCOEF = 0.97 NUMCHANS = 26 CEPLIFTER = 22 NUMCEPS = 12 ENORMALISE = F

HCompV –C configs/config –f 0.01 –m –S scripts/train.scp –M WordHmms/m0/ WordHmms/proto

Generation of an initial Hmmdefs(master macro file)

• scripts/train.scp - file list for training

HCompV –C configs/config –f 0.01 –m –S scripts/train.scp –M WordHmms/m0/ WordHmms/proto

Generation of an initial Hmmdefs(master macro file)

• wordHmms/m0/vFloor

– Global Constant Values for computing -v varFloor1 <Valiance> 39 7.217242e-001 3.275488e-001 … ….

HCompV –C configs/config –f 0.01 –m –S scripts/train.scp –M WordHmms/m0/ WordHmms/proto

)( tt ob

||)2(

e||)2(

1

)),;()(

))(2

1

jkn

jkn

jkjkttj

CONST

Nob

jktjkjkt

μoΣμo

Σμo

Generation of an initial Hmmdefs(master macro file)

• wordHmms/m0/proto

• wordHmms/proto + global means and variances => wordHmms/m0/proto

HCompV –C configs/config –f 0.01 –m –S scripts/train.scp –M WordHmms/m0/ WordHmms/proto

Generation of an initial Hmmdefs(master macro file)

• Writie wordHmms/m0/hmmdefs – Master Macro File(MMF) hmmdefs ~o <STREAMINFO>1 39 <VECSIZE> 39<MFCC_D_A_0><DIAGC> ~h “proto” <BEGINHMM> .. <ENDHMM> ~v varFloor1 <Variance> 30 7.2 … ….. ~. “청와대” <BEGINHMM> … <ENDHMM> ~. “그에게” <BEINGHMM> … <ENDHMM> …

HCompV –C configs/config –f 0.01 –m –S scripts/train.scp –M WorldHmms/m0/ WorldHmms/proto

wordHmms/m0/proto ~o <STREAMINFO> 1 39 <VECSIZE> 39<NULLD><MFCC_D_A_0><DIAGC> ~h "proto" <BEGINHMM> <NUMSTATES> 5 <STATE> 2 <MEAN> 39 -9.954108e+000 4.561644e-001 1.407761e+000 -4.952329e+000 -4.900678e+000 – … <VARIANCE> 39 7.217242e+001 3.275488e+001 6.895670e+001 6.279921e+001 7.020441e+001 … <GCONST> 1.280699e+002 <STATE> 3 <MEAN> 39 -9.954108e+000 4.561644e-001 1.407761e+000 -4.952329e+000 -4.900678e+000 … <VARIANCE> 39 7.217242e+001 3.275488e+001 6.895670e+001 6.279921e+001 7.020441e+001 … <GCONST> 1.280699e+002 <STATE> 4 <MEAN> 39 -9.954108e+000 4.561644e-001 1.407761e+000 -4.952329e+000 -4.900678e+000 … <VARIANCE> 39 7.217242e+001 3.275488e+001 6.895670e+001 6.279921e+001 7.020441e+001 … <GCONST> 1.280699e+002 <TRANSP> 5 0.000000e+000 1.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 …. 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 <ENDHMM>

Training

• Embedded Training

HERest –C configs/config –l mifs/words.mlf –S scripts/train.scp –H wordHmms/m0/hmmdefs -M wordhmms/m1 modellist/wordList

Training

• Embedded Training

• HERest • input:

-C configs/config //parameters for feature -I mlfs/words.mlf //master label file, word, speech file modellist/wordList //word name list(hmm list) -S scripts/train.scp //mfc file list for training -H wordHmms/m0/hmmdefs //hmmdefs(a set of hmm prototypes) for all words

• output: -M wordhmms/m1 // re-estimated hmmdefs

HERest –C configs/config –l mifs/words.mlf –S scripts/train.scp –H wordHmms/m0/hmmdefs -M wordhmms/m1 modellist/wordList

Training

• Configs/config

# Coding parameters NONUMSCAPES = T TARGETKIND = MFCC_0_D_A TARGETRATE = 100000.0 SAVECOMPRESSED = T SAVEWITHCRC = T WINDOWSIZE = 250000.0 USEHAMMING = T PREEMCOEF = 0.97 NUMCHANS = 26 CEPLIFTER = 22 NUMCEPS = 12 ENORMALISE = F

HERest –C configs/config –l mlfs/words.mlf –S scripts/train.scp –H wordHmms/m0/hmmdefs -M wordhmms/m1 modellist/wordList

Training

mlfs/words.mlf scripts/train.scp modelList/wordList

HERest –C configs/config –l mlfs/words.mlf –S scripts/train.scp –H wordHmms/m0/hmmdefs -M wordhmms/m1 modellist/wordList

Training

• wordHmms/m0/hmmdefs

– Master Macro File hmmdefs

– Contains a set of prototype hmms for all words

HERest –C configs/config –l mifs/words.mlf –S scripts/train.scp –H wordHmms/m0/hmmdefs -M wordhmms/m1 modellist/wordList

• hmmdefs ~o <STREAMINFO>1 39 <VECSIZE> 39<MFCC_D_A_0><DIAGC> ~h “proto” <BEGINHMM> .. <ENDHMM> ~v varFloor1 <Variance> 30 7.2 … ….. ~. “청와대” <BEGINHMM> … <ENDHMM> ~. “그에게” <BEINGHMM> … <ENDHMM> …

Training

• [output]wordHmms/m1/hmmdefs

HERest –C configs/config –l mifs/words.mlf –S scripts/train.scp –H wordHmms/m0/hmmdefs -M wordhmms/m1 modellist/wordList

• hmmdefs ~o <STREAMINFO>1 39 <VECSIZE> 39<MFCC_D_A_0><DIAGC> ~v varFloor1 <Variance> 30 7.2 … ….. ~h “청와대” <BEGINHMM> … <ENDHMM> ~h “그에게” <BEINGHMM> … <ENDHMM> …

Training

• Reestimate hmmdefs four times

• The folder wordhmms will shows

– D:/htk32/dev/wordhmms/

HERest –C configs/config –I mlfs/words.mlf -S scripts/train.scp

–H wordHmms/m1/hmmdefs –M wordHmms/m2 modelList/wordList

HERest –C configs/config –I mlfs/words.mlf -S scripts/train.scp

–H wordHmms/m2/hmmdefs –M wordHmms/m3 modelList/wordList

ERest –C configs/config –I mlfs/words.mlf -S scripts/train.scp

–H wordHmms/m3/hmmdefs –M wordHmms/m4 modelList/wordList

Recognition Test

HVite –C configs/config -S scripts/test.scp –H wordHmms/m5/hmmdefs

–w dic/pbwNet –i mlfs/recOutWordm5.mlf dic/dict odelList/wordList

Recognition Test

– HVite

• input: –C configs/config //parameters for mfc modelList/wordList // hmm name list -S scripts/test.scp // mfc vector list for testing –w dic/pbwNet //word network for recognition Dic/dict //pronouncing dictionary, word [outymb] models –H wordHmms/m5/hmmdefs //a set of hmms

• output –i mlfs/recOutWordm5.mlf // result of recognition

HVite –C configs/config -S scripts/test.scp –H wordHmms/m5/hmmdefs

–w dic/pbwNet –i mlfs/recOutWordm5.mlf dic/dict modelList/wordList

Recognition Test

• dic/dict - Writing a pronouncing dictionary

• Word [outsym] models – Word : word to be recognized

– [outsym] : string to output when word is recognized

– models : hmm model list

• Ex)

• Dictionary in word level recognition – 청와대 [청와대] 청와대

또는 청와대 청와대

• Dictionary in phoneme level recognition – 청와대 [청와대] ㅊㅓㅇㅘ ㄷ ㅐ

또는 청와대 ㅊㅓㅇㅘ ㄷ ㅐ

HVite –C configs/config -S scripts/test.scp –H wordHmms/m5/hmmdefs –w dic/pbwNet –i mlfs/recOutWordm5.mlf dic/dict modelList/wordList

Recognition Test

Example of a grammar and network

• Grammar $digit= 영 | 일 | 이 | 삼 | 사 | 오 | 육 | 칠 | 팔 | 구 ; $name = [박] 산다라 | [태] 현 ; {sil ( <$digit> 번에 | $name 에게 ) 전화 (해 | 걸어 ) sil }

• Network

• Grammar rule $ : variable {} : zero or more repitions <>:one or more repitions [] : optional

Recognition Test

• Generation of Dic/pbwNet

• or

• HParse

– input : dic/pbwGram , –C configs/config

– Output : dic/pbwNet

HParse dic/pbwGram dic/pbwNet

HParse –C configs/config dic/pbwGram dic/pbwNet

HVite –C configs/config -S scripts/test.scp –H wordHmms/m5/hmmdefs –w dic/pbwNet –i mlfs/recOutWordm5.mlf dic/dict modelList/wordList

Recognition Test

Writing as grammar file (dic/pbwGram)

HParse –C configs/config dic/pbwGram dic/pbwNet

Recognition Test

• dic/pbwNet configs/config

HParse –C configs/config dic/pbwGram dic/pbwNet

HVite –C configs/config -S scripts/test.scp –H wordHmms/m5/hmmdefs –w dic/pbwNet –i mlfs/recOutWordm5.mlf dic/dict modelList/wordList

Recognition Test

• config/config

• scripts/test.scp - modellist/wordList

HVite –C configs/config -S scripts/test.scp –H wordHmms/m5/hmmdefs –w dic/pbwNet –i mlfs/recOutWordm5.mlf dic/dict modelList/wordList

Recognition Test

• mlfs/recOutWordm5.mlf (master label format)

HVite –C configs/config -S scripts/test.scp -H wordHmms/m5/hmmdefs –w dic/pbwNet –i mlfs/recOutWordm5.mlf dic/dict modelList/wordList

Recognition Test

• Analyze the recognition result

• Sentence (Hit,Delete,Substitute,Insert,Number)

– 452 words x 7set =3164 words

– Correct rate : 2998/3164=0.94

• Word ( including two sil per word) – 452 words x 3 x 7set =9492 words

– Correct rate : 9326/9492 =0.98

HResults –I mlfs/words.mlf modelList/wordList, mlfs/recOutWordm5.mlf

Summary

• Feature vectors generation data/pbw****.wav, configs/Hcopy.config,scripts/Hcopy.scp – Hcopy.exe

Data/pbw****.mfs

• Generation of a initial hmmdefs(master macro file) wordHmms/proto,scripts/train.scp,configs/Hcopy.config – HCompV.exe

WordHmms/m0/proto, wordHmms/m0/vFloors – =>manually add

wordHmms/m0/hmmdefs, wordHmms/m0/macros,

• Training wordHmms/m0/hmmdefs,scripts/train.scp,modellist/wordList, words.mlf, configs/config – HERest.exe

wordhmms/m1/hmmdefs

• Repeat Training 4 more times to generate wordHmms/m5/hmmdefs

Summary(cont.)

• Generation of Word Network dic/pbwGram, configs/config

– Hparse.exe dic/pbwNet

• Recognition Test script/test.scp, wordhmms/m5/hmmdefs, dic/dict, modelList/wordlist, configs/config,

– HVite.exe mlfs/recOutWordm5.mf

• Analyze the recognition result

mlfs/words.mlf, modelList/wordList, mlfs/recOutWordm5.mlf

– HResult

Command List at D:/HTK32/YJK

: HCopy.exe HComV.exe …

: pbw1001.wav, pbw1001.mfc, pbw1002.wav, …

: config, Hcopy.config

: dict, pbwGram, pbwNet

: words.mlf, recOutWordM5.mlf

: wordList

: Hcopy.scp, train.scp, test.scp

proto

: hmmdefs, proto, vFloors

: hmmdefs

: hmmdefs

: hmmdefs

: hmmdefs

: hmmdefs

Folder and Files for this example

• configs/ Hcopy.config // handwritten config // handwritten, modify TARGETKIND : MFCC_0_D_A

• scripts Hcopy.scp // handwritten pbw1001.wav pbw1001.mfc train.scp //handwritten ..mfc

• mlf words.mlf //handwritten */pbw1001.lab” sil 청와대 sil. recOutWordWordm5.mlf //

• modellist wordlist //handwritten sil 청와대 컴퓨터 그에게 …

• wordHmms proto // handwritten, hmm type m0 vFloor proto hmmdefs //handwritten m1 hmmdefs … m5 hmmdefs

• dic dict //word [outym] models pbwGram pbWNet