Upload
carla-oneil
View
37
Download
1
Embed Size (px)
DESCRIPTION
The HTK Book (for HTK Version 3.2.1). Young et al., 2002. Chapter 1 The Fundamentals of HTK. HTK is a toolkit for building hidden Markov models (HMMs). Primarily used to build ASRs, but also other HMM systems: speaker and image recognition, automatic text summarization etc. - PowerPoint PPT Presentation
Citation preview
The HTK Book (for HTK Version 3.2.1)
Young et al., 2002
Chapter 1The Fundamentals of HTK
HTK is a toolkit for building hidden Markov models (HMMs).
Primarily used to build ASRs, but also other HMM systems: speaker and image recognition, automatic text summarization etc.
HTK has tools (modules) for both training and testing HMM systems.
How to Train and Test an ASR?
Things needed: A labeled speech corpus and a dictionary (+ grammar).
Procedure: 1. Divide corpus into training, development and test sets. 2. Train acoustic models. 3. Test, retrain, test … on the
development set. 4. Test on the test data.
How to Build an ASR Using HTK?
Goal: A recognizer for voice dialing.
( SENT-START ( DIAL <$digit> | (PHONE|CALL) $name) SENT-END )
Creating a Dictionary
HDMan a list of the phones. An HMM will be estimated for each of these phones.
Recording the Data
HSLab noname HSGen (wdnet dict) testprompts
Transcribing the Data
HMM training is supervised learning.
Coding the Data
HTK supports frame-based FFTs, LPCs, MFCCs, user-defined etc.
Output Probability Specification
Most common one is CDHMM. HTK also allows discrete probabilities (for
VQ data).
Flat Start Training
Build a prototype HMM with reasonable initial guesses of its parameters (HCompV).
Specify the topology – usually left to right and 3 states w/ no skips.
Create a MMF. Now use HRest or HERest for
training.
Realigning and Creating Triphones.
Use pseudo-recognition to force align training data w/ multiple pronunciations.
Evaluation
Other Issues
HTK supports supervised and unsupervised speaker adaptation (HVite).
Language model: n-gram language models.