27
© Florida Institute of Technology Speech Processing and Recognition Access audio data in real time and apply to speech recognition Final Exam Project Final Exam Project By By Hesheng Li Hesheng Li Instructor: Dr.Kepuska Instructor: Dr.Kepuska Department of Electrical and Department of Electrical and Computer Engineering Computer Engineering

Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By

Embed Size (px)

Citation preview

Page 1: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By

© Florida Institute of Technology

Speech Processing and Recognition

Access audio data in real time and apply to speech

recognition

Final Exam ProjectFinal Exam Project ByBy Hesheng LiHesheng Li

Instructor: Dr.KepuskaInstructor: Dr.KepuskaDepartment of Electrical and Computer Engineering Department of Electrical and Computer Engineering

Page 2: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By

2

© Florida Institute of Technology

Speech Processing and Recognition

Overview

Introduction Three models to access live audio data How to get audio data by using low level

API model? Application in speech recognition Comparison and Analysis Conclusion

Page 3: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By

3

© Florida Institute of Technology

Speech Processing and Recognition

Introduction

Why ?Why ? HowHow??

Live audio data access has a Wide application !Live audio data access has a Wide application !

Page 4: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By

4

© Florida Institute of Technology

Speech Processing and Recognition

Three model to access live audio data

High level Digital Audio API-----MCI

DirectSound

Low level Digital Audio API----WaveX

Page 5: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By

5

© Florida Institute of Technology

Speech Processing and Recognition

High level Digital Audio APIMCI

MCI

The media control interface (MCI) provides standard command for playing multi-media device and recording multi-media resource files

Two different ways are possible to send devices a command.

1. Command message interface

2. Command string interface

Page 6: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By

6

© Florida Institute of Technology

Speech Processing and Recognition

Command message interface

Passing binary values and structures to an Audio device is referred to as using the "Command message interface“

We use the function mciSendCommand() to send commands using this approach.

Example waveParams.lpstrElementName = "C:\\WINDOWS\\

CHORD.WAV"; mciSendCommand(0, MCI_OPEN, MCI_WAIT|MCI_OPEN_ELEMENT|MCI_OPEN_TYPE|

MCI_OPEN_TYPE_ID, (DWORD)

(LPVOID)&waveParams)

Page 7: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By

7

© Florida Institute of Technology

Speech Processing and Recognition

Command string interface

Passing strings to an Audio device is referred to

as using the "Command string interface“We use the function mciSendString() to send

commands using this approach.Example mciSendString(“ open C:\\WINDOWS\\CHORD.WAV type waveaudio alias A_Chord", 0, 0, 0)))

Page 8: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By

8

© Florida Institute of Technology

Speech Processing and Recognition

MCI

Some other command:Command message interface:

1.Start record by “MCI _REOCRD”

2.Write data to wave file by “MCI _SAVE”

3.Stop by “MCI _STOP”

4.Play by “MCI_PLAY”

Command string interface:

1.Play by "play %s %s %s"

2.Stop by “stop %s %s %s"

Page 9: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By

9

© Florida Institute of Technology

Speech Processing and Recognition

DirectSound

Like other components of DirectX,DirectSound allow you to

use the hardware in the most efficient way

Here are some other things that DirectSound makes easy: Querying hardware capabilities at run time to determine the best solution

for any given personal computer configuration Using property sets so that new hardware capabilities can be exploited even

when they are not directly supported by DirectSound Low-latency mixing of audio streams for rapid response Implementing three dimensional (3-D) sound

Page 10: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By

10

© Florida Institute of Technology

Speech Processing and Recognition

Directsound

DirectSound playback is built on the IDirectSound

Component Object Model (COM) interface and on the IDirectSoundBuffer interface for manipulating sound buffers.

DirectSound capture is based on the IDirectSoundCapture and IDirectSoundCaptureBuffer COM interfaces.

Page 11: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By

11

© Florida Institute of Technology

Speech Processing and Recognition

Low level Digital Audio API----WaveX

Open audio deviceOpen audio devicePrepare structure Prepare structure

for recordingfor recordingStartStart

recordingrecording

DataDataprocessingprocessing

Release structureRelease structureClose audio deviceClose audio device

Page 12: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By

12

© Florida Institute of Technology

Speech Processing and Recognition

Open Audio DeviceOpen Audio Device

There are several different approaches you can

take, depending upon how fancy and flexible you

want your program to be.

1. Pass the value ”Wave mapper ” to open "preferred audio input/output device.

2. Call function to get the list of the devices and then open the audio device which one you want

3. WaveInOpen() and WaveOutOpen()

Page 13: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By

13

© Florida Institute of Technology

Speech Processing and Recognition

EXAMPLE

result = waveInOpen(&outHandle, WAVE_MAPPER, result = waveInOpen(&outHandle, WAVE_MAPPER,

&waveFormat, &waveFormat,

(DWORD)myWindow, (DWORD)myWindow,

0,CALLBACK_WINDOW); 0,CALLBACK_WINDOW);

ifif (result) (result)

{ printf("There was an error opening the { printf("There was an error opening the

preferred Digital Audio in device!\r\n"); }preferred Digital Audio in device!\r\n"); }

Page 14: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By

14

© Florida Institute of Technology

Speech Processing and Recognition

EXAMPLE

iNumDevs = waveInGetNumDevs(); iNumDevs = waveInGetNumDevs();

forfor (i = 0; i < iNumDevs; i++) { (i = 0; i < iNumDevs; i++) {

ifif (!waveOutGetDevCaps(i, &woc, (!waveOutGetDevCaps(i, &woc, sizeofsizeof(WAVEOUTCAPS))) (WAVEOUTCAPS)))

{ printf("Device ID #%u: %s\r\n", i, woc.szPname); } }{ printf("Device ID #%u: %s\r\n", i, woc.szPname); } }

result = result = waveInOpen(&outHandle,iNumDevs,&waveForwaveInOpen(&outHandle,iNumDevs,&waveFormat,mat,

(DWORD)myWindow, (DWORD)myWindow,

0,CALLBACK_WINDOW); 0,CALLBACK_WINDOW);

ReturnReturn

Page 15: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By

15

© Florida Institute of Technology

Speech Processing and Recognition

Structure wavefomatexWFomatTag WFomatTag PCM, Mulaw, AulawPCM, Mulaw, AulawnChannelsnChannels Mono,StereoMono,StereonSamplePernSamplePerSecSec

Sample rates,ie 8000HZSample rates,ie 8000HZ

navgBytePenavgBytePerSecrSec

Average data-transfer rateAverage data-transfer rate

nBlockAlignBlockAlignn

Minimum atomic unit of Minimum atomic unit of datadata

wBitsPerSawBitsPerSamplemple

8bits or 16bits per sample8bits or 16bits per sample

cbSizecbSize Extra format informationExtra format information

Page 16: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By

16

© Florida Institute of Technology

Speech Processing and Recognition

Example

WAVEFORMATEX waveFormat; WAVEFORMATEX waveFormat;

/* Initialize the WAVEFORMATEX for 16-bit, 44KHz, stereo /* Initialize the WAVEFORMATEX for 16-bit, 44KHz, stereo */*/ waveFormat.wFormatTag = WAVE_FORMAT_PCM; waveFormat.wFormatTag = WAVE_FORMAT_PCM; waveFormat.nChannels = 2; waveFormat.nChannels = 2;

waveFormat.nSamplesPerSec = 44100; waveFormat.nSamplesPerSec = 44100; waveFormat.wBitsPerSample = 16; waveFormat.wBitsPerSample = 16;

waveFormat.nBlockAlign =waveFormat.nChannels* waveFormat.nBlockAlign =waveFormat.nChannels*

(waveFormat.wBitsPerSample/8); (waveFormat.wBitsPerSample/8); waveFormat.nAvgBytesPerSec=waveFormat.nSamplesPwaveFormat.nAvgBytesPerSec=waveFormat.nSamplesPerSec * erSec *

waveFormat.nBlockAlign; waveFormat.nBlockAlign;

waveFormat.cbSize = 0;waveFormat.cbSize = 0; ReturnReturn

Page 17: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By

17

© Florida Institute of Technology

Speech Processing and Recognition

Recording engine

buffer1buffer1buffer2buffer2buffer3buffer3buffer4buffer4

Call back functionCall back function

Data proccesingData proccesing

AddInBuffer()AddInBuffer()

waveInStart()waveInStart()

AudioAudio devicedevice

ms

ms

gg

Page 18: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By

18

© Florida Institute of Technology

Speech Processing and Recognition

Recording engine

buffer2buffer2buffer3buffer3buffer4buffer4buffer1buffer1

Call back functionCall back function

Data processingData processingm

sm

sgg

AudioAudio devicedevice

Circular buffer

Page 19: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By

19

© Florida Institute of Technology

Speech Processing and Recognition

1+3+1

Three Important methods: prepare a buffer for wave-audio input

function: WaveInPrepareHeader() Send the buffer to audio device,when the buffer is full

the application is notified

function: WaveInAddBuffer() Start recording

function: WaveInStart()

Page 20: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By

20

© Florida Institute of Technology

Speech Processing and Recognition

Example

if(MMSYSERR_NOERROR != if(MMSYSERR_NOERROR !=

waveInPrepareHeader(m_hWaveIn, &waveheader, sizeof(WAVEHDR)))waveInPrepareHeader(m_hWaveIn, &waveheader, sizeof(WAVEHDR)))

{ {

printf(“prepare buffer faliure!”) printf(“prepare buffer faliure!”)

}}

waveInAddBuffer(m_hWaveIn, &waveheader, sizeof(WAVEHDR));waveInAddBuffer(m_hWaveIn, &waveheader, sizeof(WAVEHDR));

waveInStart(m_hWaveIn);waveInStart(m_hWaveIn);

Page 21: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By

21

© Florida Institute of Technology

Speech Processing and Recognition

MessageWindows messages: MM_WIM_DATA:this message is sent to a window when the data is present

in the buffer and buffer is being returned to the application

Other messages: MM_WIM_CLOSE 、 MM_WIM_OPEN 、 MM_WOM_CLOSE MM_WOM_DONE 、 MM_WOM_OPEN

Call back function messages: WIM_DATA: this message is sent to the given call back function when the

data is present in the input buffer and the buffer is being

returned to the application

Other messages: WIM_CLOSE 、 WIM_DONE 、 WIN_OPEN 、 WOM_CLOSE 、 WOM_DONE 、 WOM_OPEN

Page 22: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By

22

© Florida Institute of Technology

Speech Processing and Recognition

Message ExampleCall back message

waveInOpen(&m_hWaveIn, WAVE_MAPPER, &m_Format, waveInOpen(&m_hWaveIn, WAVE_MAPPER, &m_Format,

waveInProc, 0L, CALLBACK_FUNCTION )waveInProc, 0L, CALLBACK_FUNCTION )

waveInProc(…..) {waveInProc(…..) {

switch(msg) {switch(msg) {

case WIM_OPEN: ………….case WIM_OPEN: ………….

break,break,

case WIM_DATA: ………….case WIM_DATA: ………….

break,break,

case WIM_CLOSE: …………case WIM_CLOSE: …………

Window message

waveInOpen(&m_hWaveIn, WAVE_MAPPER, &m_Format, waveInOpen(&m_hWaveIn, WAVE_MAPPER, &m_Format,

hWnd, 0L, CALLBACK_WINDOW )hWnd, 0L, CALLBACK_WINDOW )ReturnReturn

Page 23: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By

23

© Florida Institute of Technology

Speech Processing and Recognition

Application in Real-time Key Word Recognition 

Front - EndAudio

InterfaceBack-End

Training/Testing/Analysis

12/18/2003

Key-Word Recognizer

Monitor

To be continuedTo be continued….….

Page 24: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By

24

© Florida Institute of Technology

Speech Processing and Recognition

Application in Real-time Key Word Recognition

Practical problems when we apply this model in Practical problems when we apply this model in speech recognitionspeech recognition

1.1. AsynchronismAsynchronism

2.2. EfficiencyEfficiency

Page 25: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By

25

© Florida Institute of Technology

Speech Processing and Recognition

Application in Real-time Key Word Recognition

buffer2buffer2

Call back functionCall back function

Data proccessingData proccessing

buffer3buffer3 buffer4buffer4 buffer500buffer500……..

ms

ms

gg

CA

LC

AL

LL

buffer1buffer1

Page 26: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By

26

© Florida Institute of Technology

Speech Processing and Recognition

Comparison and Analysis

Mci is the easiest model ,very convenient,but Mci is the easiest model ,very convenient,but offers the least amount control,”FileLevel”offers the least amount control,”FileLevel”

waveX is more complicit ,but can flexible waveX is more complicit ,but can flexible control audio data,”BufferLevel” control audio data,”BufferLevel”

Direct sound is the most efficient Direct sound is the most efficient method,but most complicit, ”BufferLevel” method,but most complicit, ”BufferLevel”

Page 27: Speech Processing and Recognition © Florida Institute of Technology Access audio data in real time and apply to speech recognition Final Exam Project By

27

© Florida Institute of Technology

Speech Processing and Recognition

Conclusion

Apply MCI to audio document part in Apply MCI to audio document part in “video conference”“video conference”

Apply WaveX to real time speech Apply WaveX to real time speech recognition and also to “video conference” recognition and also to “video conference”

Direct sound is widely used in computer Direct sound is widely used in computer game design game design