Readme

Embed Size (px)

Citation preview

  • 5/26/2018 Readme

    1/3

    ***************************************************************************ETSI/AURORA PROJECTEricsson Eurolab Germany - January, 25th 2000.***************************************************************************

    List of directories---------------------------------------------------This CD-ROM contains the following directories:

    ./speechdata : Part of the modified Tidigit database (in big_endian format)

    ./recognizer : Scripts for the HTK recogniser../FE_v2_0 : Front-end for Distributed Speech Recognition (FE C-code v2.0).

    This is one CD (out of 4) which contains- speech data- shell scripts to train and run the HTK recognizer- the C-code of the cepstral front-end for distributed speech recognition.

    For the shell scripts and the C-code there exist individualReadme files in the corresponding subdirectories.

    There exist 3 further CDs which contain only speech data.

    All speech data should be copied into one subdirectory.About 2.5 GByte of disk space is needed for all data.The common directory is called "speechdata" on the CDs.Please, do not change this name!Speech data are in SHORT format without any header.

    Short description of the recognition experiments---------------------------------------------------Task is the speaker independent recognition of digit sequences.All speech data are derivatives of the TIdigits data base at asampling frequency of 8 kHz.Whole word models are created for all digits with the HTK recognizer.

    Two training modes are considered:- training on clean data- multi-condition training on noisy data

    "Clean" corresponds to TIdigits training data downsampled to 8 kHzand filtered with a G712 characteristic."Noisy" data corresponds to TIdigits training data downsampled to 8 kHz,filtered with a G712 characteristic and noise artificially added atseveral SNRs (20dB, 15dB, 10 dB, 5dB, *clean*no noise added).Four noises are used:- recording inside a subway- babble

    - car noise- recording in an exhibition hallSo, in total data from 20 different conditions are taken as inputfor the multi-condition training mode.

    Three differents sets of speech data are taken for the recognition.

    Set "a" consists of TIdigits test data downsampled to 8 kHz,filtered with a G712 characteristic and noise artificially added atseveral SNRs (20dB, 15dB, 10 dB, 5dB, 0dB, -5dB, *clean*no noise added).

  • 5/26/2018 Readme

    2/3

    The noises are the same as for the multi-condition training.

    Set "b" consists of TIdigits test data downsampled to 8 kHz,filtered with a G712 characteristic and noise artificially added atseveral SNRs (20dB, 15dB, 10 dB, 5dB, 0dB, -5dB, *clean*no noise added).The noises are:- restaurant- street- airport- train stationThose noises shall represent realistic scenarios for using a mobile

    terminal.

    Set "c" consists of TIdigits test data downsampled to 8 kHz,filtered with a MIRS characteristic and noise artificially added atseveral SNRs (20dB, 15dB, 10 dB, 5dB, 0dB, -5dB, *clean*no noise added).The noises are:- subway- streetThe noises are the same as used in test set a and b.The intention of test set c is the consideration of a differentfrequency characteristic (MIRS instead of G712). This shouldsimulate the influence of terminals with different characteristics.

    Two shell scripts exist for running the training in the two modesand performing recognition on all 3 test sets. The scripts are called"train_recog_clean" and "train_recog_multi" and can be found in the"recognizer" subdirectory.

    Furthermore a shell script called "create_pattern" can be found in the"recognizer" subdirectory to extract the acoustic features fromthe speech files by applying the front-end version 2.0 fordistributed speech recognition.

    INSTALLATION-----------------------------

    - Copy the speech data from all CDs to the same "speechdata" subdirectory (2.5 GByte disk space needed).- Copy the script and the frontend subdirectories.- Compile the frontend code. There exist several "make"-files.- Compile the 3 programs in the subdirectory recognizer/bin. There exists a short script "gen_exe" which shows compilation of programs with the GNU compiler.- Define your paths in the header of the "create_pattern" script. The $SPEECH_ROOT shell variable defines the path where you stored the "speechdata" subdirectory. The $REC_DIR shell variable is the path to the subdirectory ofthe recognition scripts.

    The $FEAT_ROOT shell variable is the root directory where to

    store the output of the feature extraction. You need about1.2 GByte of disk space to store all feature files.

    - Run the "create_pattern" script.- Define your paths in the headers of the "train_recog_multi" and the "train_recog_clean" scripts. Names are the same as for the"create_pattern" script.

    - Run the "train_recog_multi" and the "train_recog_clean" scripts.- You should find 6 files in the $FEAT_ROOT directory which contain the recognition results: "multi_testa.result" "multi_testb.result""multi_testc.result" "clean_testa.result" "clean_testb.result"

  • 5/26/2018 Readme

    3/3

    "clean_testc.result". Those files contain the results for clean and multi-condition training modes as well as for the threetest sets.

    - Compare your results to the ones you can find in the subdirectory recognizer/results_fe_v2.0.

    - As an alternative front-end you can run the script"create_pattern_htk" that uses HTK to extract the features.

    Define the variables in the script first as described above.- Set the FEAT_ROOT variable in the "train_recog_xxx" scripts.- Files containing the recognition results can be found in the

    subdirectory recognizer/results_htk

    NOTE FOR USERS ON MACHINES WITH LITTLE ENDIAN DATA STORAGE----------------------------------------------------------In case you are working on a machine where data are storedin LITTLE-ENDIAN format but you are reading the speech data inBig-Endian format as they are on the CDs you have to setthe option "-swap" for the Aurora font-end. To run the trainingand recognition on such a machine the option"NATURALREADORDER = TRUE"is already set in the corresponding HTK config-file "config_tr".Furthermore the option"NATURALWRITEORDER = TRUE"

    is set in the config file "config_hcopy" for running the HTKfrontend so that the same scripts for training and recognitioncan be used.

    OWN FEATURE EXTRACTION----------------------------------- Modify the "create_pattern" script according to your program. In case you have a list-based (not a file-based as theexisting one) program things should get much easier.

    - In case you are going to change the number of acoustic parameters of each feature vector you have- to create a corresponding HTK prototype file

    - change the settings of the variables "Proto" and

    "NUM_COEF" in the "train_recog_xxx" scripts