NLP Documentation

Embed Size (px)

Citation preview

  • 7/31/2019 NLP Documentation

    1/10

    ABV-Indian Institute of Information Technology

    and Management

    Natural Language Processing Lab Assignment

    Submitted To:

    Dr. Mahua Bhattacharya

    Submitted By:

    Ishan Gupta (2008IPG-37)

  • 7/31/2019 NLP Documentation

    2/10

  • 7/31/2019 NLP Documentation

    3/10

    iii. In the third step recordings were creatediv. In order to create the word level transcription a words.mlf file was created.

    Next the mkphones0.led script was created in order to facilitate the task of creating

    phone level transcriptions.The following command was executed to finally perform the task of creating phonelevel transcriptions

    HLEd -A -D -T 1 -l '*' -d dict -i phones0.mlf mkphones0.led words.mlf

    This command results in the creation of phones0.mlf and the screenshot of executionof this command is given below

  • 7/31/2019 NLP Documentation

    4/10

    v. In the fifth step the conversion of audio file to mfcc file was required. This task wasperformed by creating the codetrain.scp file and tuning the parameters in the configfile.After creating the codetrain.scp file and the config file the following command wasexecuted

    HCopy -A -D -T 1 -C wav_config -S codetrain.scp

    The screenshot of execution of this command is given below:

  • 7/31/2019 NLP Documentation

    5/10

  • 7/31/2019 NLP Documentation

    6/10

    This command results in the creation of files proto and vfloors in hmm0 folder. Thescreenshot of execution of this step is given below:

    Next step involves the creation of flat start monophones.

    This task was performed with the help of following steps.

    a. Create a new file called hmmdefs in your 'voxforge/manual/hmm0' folder: Copy the monophones1 file to your hmm0 folder; rename the monophones1 file to hmmdefs;

    b. For each phone in hmmdefs:

  • 7/31/2019 NLP Documentation

    7/10

    put the phone in double quotes; add '~h ' before the phone (note the space after the '~h'); and copy from line 5 onwards (i.e. starting from "" to

    "") of the hmm0/proto file and paste it after each phone. Leave one blank line at the end of your file.

    Creation of a file named macros was also required which involved performing thefollowing steps:

    create a new file called macros in hmm0; copy vFloors to macros copy the first 3 lines of hmm0/proto (from ~o to ) and add them

    to the top of the macros file

    Next nine folders hmm1 to hmm9 were created in the project directory i.e.htk1 andthe following command which resulted in creation of hmm1/hmmdefs andhmm1/macros folder

    HERest -A -D -T 1 -C config -I phones0.mlf -t 250.0 150.0 1000.0 -S train.scp-H hmm0/macros -H hmm0/hmmdefs -M hmm1 monophones1

    The screenshot of execution is given below

    http://www.voxforge.org/uploads/1W/tm/1WtmPybiKamc0XWu650fgg/macroshttp://www.voxforge.org/uploads/1W/tm/1WtmPybiKamc0XWu650fgg/macroshttp://www.voxforge.org/uploads/1W/tm/1WtmPybiKamc0XWu650fgg/macroshttp://www.voxforge.org/uploads/1W/tm/1WtmPybiKamc0XWu650fgg/macroshttp://www.voxforge.org/uploads/1W/tm/1WtmPybiKamc0XWu650fgg/macroshttp://www.voxforge.org/uploads/1W/tm/1WtmPybiKamc0XWu650fgg/macroshttp://www.voxforge.org/uploads/1W/tm/1WtmPybiKamc0XWu650fgg/macroshttp://www.voxforge.org/uploads/1W/tm/1WtmPybiKamc0XWu650fgg/macroshttp://www.voxforge.org/uploads/1W/tm/1WtmPybiKamc0XWu650fgg/macros
  • 7/31/2019 NLP Documentation

    8/10

    Similarly the following two commands were executed to create files in folder hmm2and hmm3 respectively

    HERest -A -D -T 1 -C config1 -I phones0.mlf -t 250.0 150.0 1000.0 -S train.scp -H hmm1/macros -H hmm1/hmmdefs -M hmm2 monophones1

    HERest -A -D -T 1 -C config1 -I phones0.mlf -t 250.0 150.0 1000.0 -S train.scp -H hmm2/macros -H hmm2/hmmdefs -M hmm3 monophones1

    vii. In the next step the main task was of fixing the silence model. This was done bycreating a sp model. Initially the contents of folder hmm3 were copied to folder hmm4 and the following steps were performed

    copy and paste the sil model from hmmdefs and re name the new onesp(don't delete your old "sil" model, you will need it - just make a copy of it)

    remove state 2 and 4 from new sp model (i.e. keep 'centre state' of old silmodel in new sp model)

    change to 3 change to 2 change to 3 change matrix in to 3 by 3 array change numbers in matrix as follows:

    0.0 1.0 0.00.0 0.9 0.1

    0.0 0.0 0.0

    Then the sil.hed file was created and the following command was executed:

    HHEd -A -D -T 1 -H hmm4/macros -H hmm4/hmmdefs -M hmm5 sil.hed monophones1

    This command resulted in the creation of files hmmdefs and macros in the folderhmm5. The screenshot of execution of this command is as given below:

  • 7/31/2019 NLP Documentation

    9/10

    Next the following commands were executed to create hmmdefs and macros file inhmm6 and hmm7 folders respectively

    HERest -A -D -T 1 -C config -I phones1.mlf -t 250.0 150.0 3000.0 -S train.scp-H hmm5/macros -H hmm5/hmmdefs -M hmm6 monophones0

    $HERest -A -D -T 1 -C config1 -I phones1.mlf -t 250.0 150.0 3000.0 -S train.scp -H hmm6/macros -H hmm6/hmmdefs -M hmm7 monophones1

    viii. Realignment of the training data was done with the help of the following command

    HVite -A -D -T 1 -l '*' -o SWT -b SENT-END -C config -H hmm7/macros -H hmm7/hmmdefs -i aligned.mlf -m -t 250.0 150.0 1000.0 -y lab -a -I words.mlf -S train.scp dict monophones0> HVite_log

    The snapshot of execution of this command is given below:

  • 7/31/2019 NLP Documentation

    10/10

    Finally the following two commands were executed in order to create hmmdefs andmacros files in hmm8 and hmm9 folders respectively.

    HERest -A -D -T 1 -C config1 -I aligned.mlf -t 250.0 150.0 3000.0 -S train.scp -H hmm7/macros -H hmm7/hmmdefs -M hmm8 monophones1

    HERest -A -D -T 1 -C config1 -I aligned.mlf -t 250.0 150.0 3000.0 -S train.scp -H hmm8/macros -H hmm8/hmmdefs -M hmm9 monophones1