14
Wake-up Word Wake-up Word Detector Detector Douglas Rauscher Douglas Rauscher ECE5525 ECE5525 April 30, 2008 April 30, 2008

Wake-up Word Detector Douglas Rauscher ECE5525 April 30, 2008

Embed Size (px)

Citation preview

Page 1: Wake-up Word Detector Douglas Rauscher ECE5525 April 30, 2008

Wake-up Word DetectorWake-up Word Detector

Douglas RauscherDouglas Rauscher

ECE5525ECE5525

April 30, 2008April 30, 2008

Page 2: Wake-up Word Detector Douglas Rauscher ECE5525 April 30, 2008

Douglas RauscherDouglas Rauscher

IntroductionIntroduction

The purpose of this project is to The purpose of this project is to generate feature vectors and Hidden generate feature vectors and Hidden Markov Models for a single wordMarkov Models for a single word

Data is processed using Sphinx and Data is processed using Sphinx and MatlabMatlab

The Wake-up Word chosen is “Help”The Wake-up Word chosen is “Help”

Page 3: Wake-up Word Detector Douglas Rauscher ECE5525 April 30, 2008

Douglas RauscherDouglas Rauscher

CorpusCorpus

The corpus used is the original The corpus used is the original WUW_Corpus, provided on the WUW_Corpus, provided on the ECE5526 server:ECE5526 server:ftp://163.118.203.219/CORPORA/WUW_Corpora/WUW_Corpus/ftp://163.118.203.219/CORPORA/WUW_Corpora/WUW_Corpus/

This corpus was used because single This corpus was used because single utterances of the word “Help” were utterances of the word “Help” were frequent in the data setfrequent in the data set

Data is in Data is in µµ-law format-law format

Page 4: Wake-up Word Detector Douglas Rauscher ECE5525 April 30, 2008

Douglas RauscherDouglas Rauscher

File lists & TranscriptionsFile lists & Transcriptions Before processing in Sphinx, “transcription” and “fileids” files Before processing in Sphinx, “transcription” and “fileids” files

need to be created:need to be created:• wuw_corpus_train.fileidswuw_corpus_train.fileids• wuw_corpus_train.transcriptionwuw_corpus_train.transcription• wuw_corpus_test.fileidswuw_corpus_test.fileids• wuw_corpus_test.transcriptionwuw_corpus_test.transcription

These were created in Matlab by searching the given “|”-delimited These were created in Matlab by searching the given “|”-delimited file for “Help” utterances.file for “Help” utterances.

80% of “Help” utterances were used in the training list. The 80% of “Help” utterances were used in the training list. The remaining 20% were used in the test list.remaining 20% were used in the test list.

All utterances that did not contain “Help” were included in the test All utterances that did not contain “Help” were included in the test set to test for false alarms.set to test for false alarms.

A handful of the utterances in the original .trans file were A handful of the utterances in the original .trans file were manually removed from the list because eithermanually removed from the list because either• They had no data bytes in the fileThey had no data bytes in the file• Sphinx had trouble with the sound qualitySphinx had trouble with the sound quality• The utterance was cut off in such a way that Sphinx threw an errorThe utterance was cut off in such a way that Sphinx threw an error

Page 5: Wake-up Word Detector Douglas Rauscher ECE5525 April 30, 2008

Douglas RauscherDouglas Rauscher

dcr_extract.mdcr_extract.mclose all; clear all; clc;close all; clear all; clc;

A = textread('C:\CMUtutorial\WUW_Corpus\wuw.trans','%s','delimiter','|');A = textread('C:\CMUtutorial\WUW_Corpus\wuw.trans','%s','delimiter','|');

idx = 1:length(A);idx = 1:length(A);

idx = idx((strcmp(A,'Male')+strcmp(A,'Female'))>0);idx = idx((strcmp(A,'Male')+strcmp(A,'Female'))>0);

gender = A(idx);gender = A(idx);

dialect = A(idx+1);dialect = A(idx+1);

phone_type = A(idx+2);phone_type = A(idx+2);

filename = A(idx+3);filename = A(idx+3);

CallNO = A(idx+4);CallNO = A(idx+4);

UttNO = A(idx+5);UttNO = A(idx+5);

Ortho = A(idx+6);Ortho = A(idx+6);

AllIdx = 1:length(Ortho);AllIdx = 1:length(Ortho);

HelpIdx = AllIdx(strcmp(Ortho,'Help'));HelpIdx = AllIdx(strcmp(Ortho,'Help'));

NotHelpIdx = AllIdx(~strcmp(Ortho,'Help'));NotHelpIdx = AllIdx(~strcmp(Ortho,'Help'));

N = floor(length(HelpIdx)*0.8);N = floor(length(HelpIdx)*0.8);

fout = fopen('C:\CMUtutorial\WUW_Corpus\etc\wuw_corpus_train.fileids','w');fout = fopen('C:\CMUtutorial\WUW_Corpus\etc\wuw_corpus_train.fileids','w');

ftsn = fopen('C:\CMUtutorial\WUW_Corpus\etc\wuw_corpus_train.transcription','w');ftsn = fopen('C:\CMUtutorial\WUW_Corpus\etc\wuw_corpus_train.transcription','w');

for k=1:Nfor k=1:N

fprintf(fout,'calls/%s/WUW%s_%s\n',char(filename(HelpIdx(k))),...fprintf(fout,'calls/%s/WUW%s_%s\n',char(filename(HelpIdx(k))),...

char(filename(HelpIdx(k))),...char(filename(HelpIdx(k))),...

char(CallNO(HelpIdx(k))));char(CallNO(HelpIdx(k))));

fprintf(ftsn,'<s> %s </s> (WUW%s_%s)\n',upper(char(Ortho(HelpIdx(k)))),...fprintf(ftsn,'<s> %s </s> (WUW%s_%s)\n',upper(char(Ortho(HelpIdx(k)))),...

char(filename(HelpIdx(k))),...char(filename(HelpIdx(k))),...

char(CallNO(HelpIdx(k))));char(CallNO(HelpIdx(k))));

endendfclose(fout);fclose(fout);

fclose(ftsn);fclose(ftsn);

fout = fopen('C:\CMUtutorial\WUW_Corpus\etc\wuw_corpus_test.fileids','w');fout = fopen('C:\CMUtutorial\WUW_Corpus\etc\wuw_corpus_test.fileids','w');ftsn = fopen('C:\CMUtutorial\WUW_Corpus\etc\wuw_corpus_test.transcription','w');ftsn = fopen('C:\CMUtutorial\WUW_Corpus\etc\wuw_corpus_test.transcription','w');% Remaining "Help"% Remaining "Help"for k=(N+1):length(HelpIdx)for k=(N+1):length(HelpIdx) fprintf(fout,'calls/%s/WUW%s_%s\n',char(filename(HelpIdx(k))),...fprintf(fout,'calls/%s/WUW%s_%s\n',char(filename(HelpIdx(k))),... char(filename(HelpIdx(k))),...char(filename(HelpIdx(k))),... char(CallNO(HelpIdx(k))));char(CallNO(HelpIdx(k)))); fprintf(ftsn,'%s (WUW%s_%s)\n',upper(char(Ortho(HelpIdx(k)))),...fprintf(ftsn,'%s (WUW%s_%s)\n',upper(char(Ortho(HelpIdx(k)))),... char(filename(HelpIdx(k))),...char(filename(HelpIdx(k))),... char(CallNO(HelpIdx(k))));char(CallNO(HelpIdx(k))));endend% Other utterances% Other utterancesfor k=1:length(NotHelpIdx)for k=1:length(NotHelpIdx) fprintf(fout,'calls/%s/WUW%s_%s\n',char(filename(NotHelpIdx(k))),...fprintf(fout,'calls/%s/WUW%s_%s\n',char(filename(NotHelpIdx(k))),... char(filename(NotHelpIdx(k))),...char(filename(NotHelpIdx(k))),... char(CallNO(NotHelpIdx(k))));char(CallNO(NotHelpIdx(k)))); fprintf(ftsn,'%s (WUW%s_%s)\n',upper(char(Ortho(NotHelpIdx(k)))),...fprintf(ftsn,'%s (WUW%s_%s)\n',upper(char(Ortho(NotHelpIdx(k)))),... char(filename(NotHelpIdx(k))),...char(filename(NotHelpIdx(k))),... char(CallNO(NotHelpIdx(k))));char(CallNO(NotHelpIdx(k))));endendfclose(fout);fclose(fout);fclose(ftsn);fclose(ftsn);

Page 6: Wake-up Word Detector Douglas Rauscher ECE5525 April 30, 2008

Douglas RauscherDouglas Rauscher

Data preparationData preparation

Corpus data was originally:Corpus data was originally:• file extension .ulawfile extension .ulaw• 8-bit 8-bit µµ-law format-law format• 8kHz sample rate8kHz sample rate

This data must be converted, as .ulaw files This data must be converted, as .ulaw files are not readable by Sphinx.are not readable by Sphinx.

Format chosen to convert to:Format chosen to convert to:• File extension .rawFile extension .raw• 16-bit linear quantization16-bit linear quantization• 16kHz (linearly interpolated)16kHz (linearly interpolated)

Page 7: Wake-up Word Detector Douglas Rauscher ECE5525 April 30, 2008

Douglas RauscherDouglas Rauscher

ulaw2raw.mulaw2raw.mfor k=0:252for k=0:252 ulaw2raw(sprintf('C:\\CMUtutorial\\WUW_Corpus\\calls\\%05d\\',k),0);ulaw2raw(sprintf('C:\\CMUtutorial\\WUW_Corpus\\calls\\%05d\\',k),0);endend

functionfunction ulaw2raw(filepath,playflag) ulaw2raw(filepath,playflag)% ulaw2raw('C:\CMUtutorial\WUW_Corpus\calls\00000\');% ulaw2raw('C:\CMUtutorial\WUW_Corpus\calls\00000\');

cd_save = cd;cd_save = cd;cd(filepath);cd(filepath);

files = dir;files = dir;

% US standard u-law coeff% US standard u-law coeffu=255;u=255;

for k=3:length(files)for k=3:length(files) if (files(k).isdir==0) && (strcmp(files(k).name(end-4:end),'.ulaw'))if (files(k).isdir==0) && (strcmp(files(k).name(end-4:end),'.ulaw')) disp(files(k).name);disp(files(k).name); fin = fopen(files(k).name,'r');fin = fopen(files(k).name,'r');

A = fread(fin,'int8');A = fread(fin,'int8'); % move data to proper sign% move data to proper sign A1 = A.*(A<=0)+(127-A).*(A>0);A1 = A.*(A<=0)+(127-A).*(A>0); % remove u-law% remove u-law B1 = sign(A1).*(1/u).*(((1+u).^abs(A1/128))-1);B1 = sign(A1).*(1/u).*(((1+u).^abs(A1/128))-1); B2 = reshape([B1,((B1+[B1(2:end);0])./2)].',1,[]);B2 = reshape([B1,((B1+[B1(2:end);0])./2)].',1,[]); if(playflag)if(playflag) sound(B2,16000)sound(B2,16000) pause(length(B2)/16000);pause(length(B2)/16000); endend fclose(fin);fclose(fin); generateRawWav(files(k).name(1:end-5),B2);generateRawWav(files(k).name(1:end-5),B2); endendendend

cd(cd_save);cd(cd_save);

functionfunction generateRawWav(filename,data) generateRawWav(filename,data)fout = fopen(strcat(filename,'.raw'),'w');fout = fopen(strcat(filename,'.raw'),'w');dataq = round(32768.*data./128);dataq = round(32768.*data./128);fwrite(fout,dataq,'int16');fwrite(fout,dataq,'int16');fclose(fout);fclose(fout);

Page 8: Wake-up Word Detector Douglas Rauscher ECE5525 April 30, 2008

Douglas RauscherDouglas Rauscher

Language model creationLanguage model creation For a Wake-up Word recognizer, a language model is not For a Wake-up Word recognizer, a language model is not

particularly desirable in detecting the word.particularly desirable in detecting the word. Sphinx allows you to weight the priority of the language Sphinx allows you to weight the priority of the language

model in it’s calculations, but does not appear to allow the model in it’s calculations, but does not appear to allow the user to disable the language model all together.user to disable the language model all together.

Therefore, to avoid errors, a custom language model had Therefore, to avoid errors, a custom language model had to be created.to be created.1.1. The lm tool generator was used to convert a text file that The lm tool generator was used to convert a text file that

contained only the word “Help” to a .lm file.contained only the word “Help” to a .lm file.http://www.speech.cs.cmu.edu/tools/lmtool.htmlhttp://www.speech.cs.cmu.edu/tools/lmtool.html

2.2. The lm3g2dmp tool was used to convert the .lm file The lm3g2dmp tool was used to convert the .lm file to .lm.DMP format.to .lm.DMP format.run cmdrun cmdcd C:\CMUtutorial\lm3g2dmp\Debug>cd C:\CMUtutorial\lm3g2dmp\Debug>lm3g2dmp 7092.lm ./lm3g2dmp 7092.lm ./

Page 9: Wake-up Word Detector Douglas Rauscher ECE5525 April 30, 2008

Douglas RauscherDouglas Rauscher

Training the ModelTraining the Model

Sphinx Training Configuration file was edited to Sphinx Training Configuration file was edited to use proper input filesuse proper input files

The Max Number of Gaussians was set to 8The Max Number of Gaussians was set to 8 The Number of HMM States was increased from 3 The Number of HMM States was increased from 3

to 5, without significant improvementto 5, without significant improvement Sphinx commands:Sphinx commands:

cd c:/CMUtutorial/WUW_Corpus/cd c:/CMUtutorial/WUW_Corpus/

perl scripts_pl/make_feats.pl -ctl etc/wuw_corpus_train.fileids -cfg etc/sphinx_train.cfg -param etc/feat.paramsperl scripts_pl/make_feats.pl -ctl etc/wuw_corpus_train.fileids -cfg etc/sphinx_train.cfg -param etc/feat.params

perl scripts_pl/RunAll.plperl scripts_pl/RunAll.pl

Page 10: Wake-up Word Detector Douglas Rauscher ECE5525 April 30, 2008

Douglas RauscherDouglas Rauscher

Testing the ModelTesting the Model

Sphinx Testing Configuration file was edited to Sphinx Testing Configuration file was edited to use proper input files.use proper input files.

Language model weight was set to “1” (the Language model weight was set to “1” (the lowest allowable setting)lowest allowable setting)

Number of Gaussians was set to 8 to match the Number of Gaussians was set to 8 to match the training configurationtraining configuration

Sphinx commands:Sphinx commands:perl scripts_pl/make_feats.pl -ctl etc/wuw_corpus_test.fileids -cfg etc/sphinx_decode.cfg -param etc/feat.paramsperl scripts_pl/make_feats.pl -ctl etc/wuw_corpus_test.fileids -cfg etc/sphinx_decode.cfg -param etc/feat.params

perl scripts_pl/decode/slave.plperl scripts_pl/decode/slave.pl

Page 11: Wake-up Word Detector Douglas Rauscher ECE5525 April 30, 2008

Douglas RauscherDouglas Rauscher

Sphinx OutputSphinx Output

Sphinx was used to calculate Sphinx was used to calculate Acoustic Scoring only, not to perform Acoustic Scoring only, not to perform thresholding.thresholding.

These resulting scores were parsed These resulting scores were parsed in Matlab and PDF/CDF plots were in Matlab and PDF/CDF plots were generated.generated.

See attached output document for See attached output document for raw Cygwin outputraw Cygwin output

Page 12: Wake-up Word Detector Douglas Rauscher ECE5525 April 30, 2008

Douglas RauscherDouglas Rauscher

plotDistributions.mplotDistributions.m% plotDistributions% plotDistributionsclear all; clc; close all;clear all; clc; close all;

fn = 'C:\CMUtutorial\WUW_Corpus\logdir\decode\wuw_corpus-1-1.log';fn = 'C:\CMUtutorial\WUW_Corpus\logdir\decode\wuw_corpus-1-1.log';RawText = textread(fn,'%s');RawText = textread(fn,'%s');

idx = [];idx = [];for k=1:(length(RawText)-6)for k=1:(length(RawText)-6) if(~isempty(findstr(char(RawText(k)),'fv:')) &&...if(~isempty(findstr(char(RawText(k)),'fv:')) &&... strcmp(char(RawText(k+1)),'HELP'))strcmp(char(RawText(k+1)),'HELP')) idx = [idx; k:k+7];idx = [idx; k:k+7]; endendendendRawText = RawText(idx);RawText = RawText(idx);

% fetch and plot Acoustic Score histograms% fetch and plot Acoustic Score histogramsHelpAScr = [];HelpAScr = [];FalsAScr = [];FalsAScr = [];for k=1:size(RawText,1)for k=1:size(RawText,1) if(findstr(char(RawText(k,1)),'_008>'))if(findstr(char(RawText(k,1)),'_008>')) % True HELP% True HELP HelpAScr = [HelpAScr str2num(char(RawText(k,5)))];HelpAScr = [HelpAScr str2num(char(RawText(k,5)))]; elseelse % Not a HELP% Not a HELP FalsAScr = [FalsAScr str2num(char(RawText(k,5)))];FalsAScr = [FalsAScr str2num(char(RawText(k,5)))]; endendendend

mn = min(min(HelpAScr),min(FalsAScr));mn = min(min(HelpAScr),min(FalsAScr));mx = max(max(HelpAScr),max(FalsAScr));mx = max(max(HelpAScr),max(FalsAScr));vals = mn:((mx-mn)/100):mx;vals = mn:((mx-mn)/100):mx;HelpAScrHist = hist(HelpAScr,vals);HelpAScrHist = hist(HelpAScr,vals);HelpAScrHist = HelpAScrHist./sum(HelpAScrHist);HelpAScrHist = HelpAScrHist./sum(HelpAScrHist);FalsAScrHist = hist(FalsAScr,vals);FalsAScrHist = hist(FalsAScr,vals);FalsAScrHist = FalsAScrHist./sum(FalsAScrHist);FalsAScrHist = FalsAScrHist./sum(FalsAScrHist);for k=1:length(vals)for k=1:length(vals) HelpAScrCDF(k) = sum(HelpAScrHist(1:k));HelpAScrCDF(k) = sum(HelpAScrHist(1:k)); FalsAScrCDF(k) = sum(FalsAScrHist(k:end));FalsAScrCDF(k) = sum(FalsAScrHist(k:end));endend

figure;figure;subplot(2,1,1); plot(vals,HelpAScrHist,'b',vals,FalsAScrHist,'r');subplot(2,1,1); plot(vals,HelpAScrHist,'b',vals,FalsAScrHist,'r');title('Probability Density Function')title('Probability Density Function')legend('Help','Other Utterances')legend('Help','Other Utterances')axis([mn,mx,0,1.1*max(max(HelpAScrHist),max(FalsAScrHist))]);axis([mn,mx,0,1.1*max(max(HelpAScrHist),max(FalsAScrHist))]);subplot(2,1,2); plot(vals,HelpAScrCDF, 'b',vals,FalsAScrCDF, 'r');subplot(2,1,2); plot(vals,HelpAScrCDF, 'b',vals,FalsAScrCDF, 'r');title('Cumulative Distribution Function')title('Cumulative Distribution Function')axis([mn,mx,0,1.1]);axis([mn,mx,0,1.1]);

Page 13: Wake-up Word Detector Douglas Rauscher ECE5525 April 30, 2008

Douglas RauscherDouglas Rauscher

plotDistributions.mplotDistributions.m

Page 14: Wake-up Word Detector Douglas Rauscher ECE5525 April 30, 2008

Douglas RauscherDouglas Rauscher

ConclusionsConclusions Sphinx had problems correctly detecting the word Sphinx had problems correctly detecting the word

“Help” in this test, but there is clearly a decent “Help” in this test, but there is clearly a decent model created.model created.

The test set was rather constrained and limited, The test set was rather constrained and limited, and would benefit from a much larger sampling of and would benefit from a much larger sampling of “Help” utterances.“Help” utterances.

Sphinx features that would have been nice:Sphinx features that would have been nice:• Native .ulaw file inputNative .ulaw file input• Simpler mechanism to input sample rateSimpler mechanism to input sample rate• Native text file input for language model, by integrating Native text file input for language model, by integrating

the .lm generator and .lm.DMP converter into Sphinx.the .lm generator and .lm.DMP converter into Sphinx.• Better handling of utterance fragmentsBetter handling of utterance fragments