13
The HTK Book (for HTK Version 3.2.1) Young et al., 2002

The HTK Book (for HTK Version 3.2.1)

Embed Size (px)

DESCRIPTION

The HTK Book (for HTK Version 3.2.1). Young et al., 2002. Chapter 1 The Fundamentals of HTK. HTK is a toolkit for building hidden Markov models (HMMs). Primarily used to build ASRs, but also other HMM systems: speaker and image recognition, automatic text summarization etc. - PowerPoint PPT Presentation

Citation preview

Page 1: The HTK Book  (for HTK Version 3.2.1)

The HTK Book (for HTK Version 3.2.1)

Young et al., 2002

Page 2: The HTK Book  (for HTK Version 3.2.1)

Chapter 1The Fundamentals of HTK

HTK is a toolkit for building hidden Markov models (HMMs).

Primarily used to build ASRs, but also other HMM systems: speaker and image recognition, automatic text summarization etc.

HTK has tools (modules) for both training and testing HMM systems.

Page 3: The HTK Book  (for HTK Version 3.2.1)

How to Train and Test an ASR?

Things needed: A labeled speech corpus and a dictionary (+ grammar).

Procedure: 1. Divide corpus into training, development and test sets. 2. Train acoustic models. 3. Test, retrain, test … on the

development set. 4. Test on the test data.

Page 4: The HTK Book  (for HTK Version 3.2.1)

How to Build an ASR Using HTK?

Goal: A recognizer for voice dialing.

( SENT-START ( DIAL <$digit> | (PHONE|CALL) $name) SENT-END )

Page 5: The HTK Book  (for HTK Version 3.2.1)

Creating a Dictionary

HDMan a list of the phones. An HMM will be estimated for each of these phones.

Page 6: The HTK Book  (for HTK Version 3.2.1)

Recording the Data

HSLab noname HSGen (wdnet dict) testprompts

Page 7: The HTK Book  (for HTK Version 3.2.1)

Transcribing the Data

HMM training is supervised learning.

Page 8: The HTK Book  (for HTK Version 3.2.1)

Coding the Data

HTK supports frame-based FFTs, LPCs, MFCCs, user-defined etc.

Page 9: The HTK Book  (for HTK Version 3.2.1)

Output Probability Specification

Most common one is CDHMM. HTK also allows discrete probabilities (for

VQ data).

Page 10: The HTK Book  (for HTK Version 3.2.1)

Flat Start Training

Build a prototype HMM with reasonable initial guesses of its parameters (HCompV).

Specify the topology – usually left to right and 3 states w/ no skips.

Create a MMF. Now use HRest or HERest for

training.

Page 11: The HTK Book  (for HTK Version 3.2.1)

Realigning and Creating Triphones.

Use pseudo-recognition to force align training data w/ multiple pronunciations.

Page 12: The HTK Book  (for HTK Version 3.2.1)

Evaluation

Page 13: The HTK Book  (for HTK Version 3.2.1)

Other Issues

HTK supports supervised and unsupervised speaker adaptation (HVite).

Language model: n-gram language models.