23
Speech Recognition LIACS Media Lab Leiden University Seminar Speech Recognition Projects E.M. Bakker LIACS Media Lab Leiden University

Seminar Speech Recognition Projects E.M. Bakker LIACS Media Lab Leiden University

  • Upload
    woods

  • View
    23

  • Download
    4

Embed Size (px)

DESCRIPTION

Seminar Speech Recognition Projects E.M. Bakker LIACS Media Lab Leiden University. Project Outline. Implementation Project Modules Speech Database Speech Signal Analysis Hidden Markov Models + Training Language Models + Training Recognition Algorithms Evaluation. Implementation. - PowerPoint PPT Presentation

Citation preview

Page 1: Seminar Speech Recognition Projects E.M. Bakker LIACS Media Lab Leiden University

Speech Recognition LIACS Media Lab Leiden University

Seminar

Speech Recognition Projects

E.M. Bakker

LIACS Media Lab

Leiden University

Page 2: Seminar Speech Recognition Projects E.M. Bakker LIACS Media Lab Leiden University

Speech Recognition LIACS Media Lab Leiden University

Project Outline

• Implementation

• Project Modules– Speech Database– Speech Signal Analysis– Hidden Markov Models + Training– Language Models + Training– Recognition Algorithms

• Evaluation

Page 3: Seminar Speech Recognition Projects E.M. Bakker LIACS Media Lab Leiden University

Speech Recognition LIACS Media Lab Leiden University

Implementation

• A Safe C++ Programming Style– Not to be used in C++– Syntax and Programming Style– Conventions– Basic Design Rules

• Program Services

• Memory Services

• Diagnostics

• Important Topics– Portability– Testing– Reliability

Page 4: Seminar Speech Recognition Projects E.M. Bakker LIACS Media Lab Leiden University

Speech Recognition LIACS Media Lab Leiden University

Implementation: A Safe C++ Programming Style

• Features to be avoided, or not to be used in C++– C inherited features

if(c=0), ?:, , ,goto, break, continue, union, struct, bit-wise, (&& || !), int, short, double, unsigned, ++, --, explicit constant numbers, cast, variable argument lists

– Preprocessor features macros for constants, macros for functions, #pragma, compiler/platform

specific directives

– Object Oriented global data, global non-member functions, public data, friend,

overloading operators@, ++,...

– Memory and pointer-related pointers, new, delete, malloc, free(), pointers to functions, ->, ->* .*, const

char*, NULL, type &ref - t, type count[], type *count, type *count[], type (*count)[], type (&count)

printf, scanf, assembly language, object passed by val and temporary objects

Page 5: Seminar Speech Recognition Projects E.M. Bakker LIACS Media Lab Leiden University

Speech Recognition LIACS Media Lab Leiden University

Implementation: Syntax and Programming Style

• Programs in plain English

• Meaningful names

• One statement per line

• const: for data and methods whenever possible

• variables: local whenever possible

• private/protected data members whenever possible

• do not use confusing syntax like– if (a)– for (I=0;I++<4;)

• always use default in switch-statement

• use assert in all the critical points

Page 6: Seminar Speech Recognition Projects E.M. Bakker LIACS Media Lab Leiden University

Speech Recognition LIACS Media Lab Leiden University

Implementation: Conventions

• Functions and methods: My_Example_Function()

• Variables: my_example_var

• Classes: MyExampleClass

• Constants: MY_EXAMPLE_CONSTANT

• In general: meaningful names, except for indices

• Comment: – file-description– version history (bugs new functionality)– user information (user guide)– implementation information (reference guide)– code comment

Page 7: Seminar Speech Recognition Projects E.M. Bakker LIACS Media Lab Leiden University

Speech Recognition LIACS Media Lab Leiden University

Implementation: Basic Design Rules

– Project modularity achieved through classes. – Structure the program by Classes only (only methods are allowed, no

separate functions)– Project is decomposed into modules with as little cross-dependence

as possible– One module per class– Classes should have minimal interfaces– Modules should have minimal dependencies– Implementation issues hidden from clients (information hiding)– Inheritance should be extensively used

• Advantages:– Improved readability– Reduced maintenance work– Improved robustness

Page 8: Seminar Speech Recognition Projects E.M. Bakker LIACS Media Lab Leiden University

Speech Recognition LIACS Media Lab Leiden University

Implementation: Program Services

• Safe memory management– memory service– dynamic memory management: C++ without pointers

• Diagnostics– decide which data must be checked when, and define the

actions

• File management, user interfaces

• User program configuration management

• Text data management

• Mathematical data management

Page 9: Seminar Speech Recognition Projects E.M. Bakker LIACS Media Lab Leiden University

Speech Recognition LIACS Media Lab Leiden University

Implementation: Memory Services & Diagnostics

• Memory Services

• Diagnostics

Page 10: Seminar Speech Recognition Projects E.M. Bakker LIACS Media Lab Leiden University

Speech Recognition LIACS Media Lab Leiden University

Some Important Topics

• Portability– portability and defined options in files: compatib.h, defopt.h,

Boolean.h

• Testing– test routines and version history

• Reliability– readability– maintainability

Page 11: Seminar Speech Recognition Projects E.M. Bakker LIACS Media Lab Leiden University

Speech Recognition LIACS Media Lab Leiden University

RES General Specification

• RES (Recognition Experimental System) is an HMM based experimental tool for continuous multispeaker speech recognition. The system works on recorded speech files and it basically includes:

– the batch modules for acoustic model initialization and training– grammar models training– phoneme/word recognition– performance evaluation.

• RES is state of art in speaker independent phonetic recognition: – with 69.2% of percent correct using all TIMIT test data using context

independent phonetic models. – It yields 87.83% of percent correct in speaker independent word

recognition on ATIS using context independent phonetic models not optimally tuned on this database.

Page 12: Seminar Speech Recognition Projects E.M. Bakker LIACS Media Lab Leiden University

Speech Recognition LIACS Media Lab Leiden University

RES General Specification

• How to build an ASR system for a different language?– we need many segmented speech recordings to feed the

training programs and get good HMM models of our voices. – use a freeware program like Snack 1.4 (search on the

Internet) to prepare the data. – search a Dutch multispeaker phonetic database.– Design and feed the right language-model.

• Speech samples to train and test the RES system?– You can download speech samples from Linguistic Data

Consortium (LDC) after you have obtained a user account.

Page 13: Seminar Speech Recognition Projects E.M. Bakker LIACS Media Lab Leiden University

Speech Recognition LIACS Media Lab Leiden University

General Specification

Page 14: Seminar Speech Recognition Projects E.M. Bakker LIACS Media Lab Leiden University

Speech Recognition LIACS Media Lab Leiden University

General Specification

• Required C++ custom libraries: – none

• Portability: – Linux – Windows 3.x, Windows 95, NT– DOS with DjGpp

• Compilers: – Ms Visual C++ >4.0– DjGpp version 2.8.1 or– GNU Linux Gpp version 2.8.1 or newer

Page 15: Seminar Speech Recognition Projects E.M. Bakker LIACS Media Lab Leiden University

Speech Recognition LIACS Media Lab Leiden University

Speech Database

• Speech data retrieval

• Speech files: – NIST1A (ATIS x, TIMIT), – MS WAV– custom, adding software drivers

• Label File: – ATIS– IMIT– various subsets, custom labels alphabets included in a file, custom label

handling supplying a driver.

• Other options: – overlap– window length– file buffering

Page 16: Seminar Speech Recognition Projects E.M. Bakker LIACS Media Lab Leiden University

Speech Recognition LIACS Media Lab Leiden University

Speech Signal Analysis

• Feature Extraction

• Signal processing: – Any concatenation of processing blocks is allowed. Each

block performs a class of processing and the actual processing is specified by the options.

• Available processing blocks:– Preemphasis_and_Hamming– Mean_Subtraction– FFT– MFCC with Log/non Log Energy– any order differences– Other Blocks can be added supplying proper drivers.

Page 17: Seminar Speech Recognition Projects E.M. Bakker LIACS Media Lab Leiden University

Speech Recognition LIACS Media Lab Leiden University

Hidden Markov Models

• HMM model Initialization

• HMM topology: – 4 predefined types with configurable number of states.

• Acoustic Units: – as allowed by the available database

• emission densities: – Untied Gaussian mixtures– full or diagonal covariance matrix– number of mixtures configurable for each acoustic unit

• Initialization method: – maximum distortion splitting on segmented database

Page 18: Seminar Speech Recognition Projects E.M. Bakker LIACS Media Lab Leiden University

Speech Recognition LIACS Media Lab Leiden University

Hidden Markov Models Training

• Training algorithm: – Single and Simultaneous Model Re-estimation Baum-

Welch.– parameter re-estimation: selective by configuration.

Page 19: Seminar Speech Recognition Projects E.M. Bakker LIACS Media Lab Leiden University

Speech Recognition LIACS Media Lab Leiden University

Language Models

• Language Model:– unigram and bigram on words and phonemes– Smoothing techniques– Good-Turing, non-linear and linear

interpolation model– Word Clustering: minimum mean square error

on transition probability– Perplexity– word and phoneme based computation

Page 20: Seminar Speech Recognition Projects E.M. Bakker LIACS Media Lab Leiden University

Speech Recognition LIACS Media Lab Leiden University

Recognition

• Recognition– Recognition Unit:– acoustic units, words– Algorithm Type– Viterbi with Beam search and Window

search pruning strategies

Page 21: Seminar Speech Recognition Projects E.M. Bakker LIACS Media Lab Leiden University

Speech Recognition LIACS Media Lab Leiden University

Evaluation

• Evaluation: Wagner-Fisher algorithm

Page 22: Seminar Speech Recognition Projects E.M. Bakker LIACS Media Lab Leiden University

Speech Recognition LIACS Media Lab Leiden University

Projects

1. Dutch Speech Corpus + Database Interface (2 groups)– in an early phase some example classes should be available, like counting, etc.– maybe use tools like ‘praat’ (for wav labeling with phonetics), etc.

2. Signal Analysis and Feature Extraction (2 groups)

3. HMM Initialization + HMM Dutch Phonetic Training (2 groups)

4. Dutch Language Model + Word Class Training (2 groups)– in an early phase some examples should be available

5. Recognition (2 groups)

Evaluation (All)

Page 23: Seminar Speech Recognition Projects E.M. Bakker LIACS Media Lab Leiden University

Speech Recognition LIACS Media Lab Leiden University

Project Designs

The design of the project should contain the following:

• The implementation goals– The underlying technique and theory– A functional description of the starting-code and tools– The design of new code and functionality– Implementation goals and a time-scheme– NB if it is considered difficult to obtain all the goals within the

current time-frame, team up with the other team

• Interfacing– Define the module-interfaces– Define the time-path for the essential module-inputs– Define a realistic time-path for the (partial-)outputs of the module.