16
Image Processing Techniques for Speech Recognition Presented by Amr Medhat, Mostafa Fathy and Sameh Serag Supervised by Dr. Magda Fayek Date: 5-5-2004

Audio Visual Speech Recognition

  • Upload
    amr

  • View
    69

  • Download
    3

Embed Size (px)

DESCRIPTION

Intro about utilizing image processing technology in speech recognition

Citation preview

Page 1: Audio Visual Speech Recognition

Image Processing Techniques for Speech Recognition

Presented by

Amr Medhat, Mostafa Fathy and Sameh Serag

Supervised by

Dr. Magda Fayek

Date: 5-5-2004

Page 2: Audio Visual Speech Recognition

AgendaAgenda

• Introduction

• Audio Visual Modeling

• Spectrogram Reading

• Spectrogram Filtering

Page 3: Audio Visual Speech Recognition

IntroductionIntroduction

• What is Speech Recognition?

• Speech Recognition Problems– noise– inter and intra speaker variation– continuous: no boundaries between words

• Image Processing is a possible solution

Page 4: Audio Visual Speech Recognition

Audio Visual Speech ModelingAudio Visual Speech Modeling

• Reading speech from facial and lip movements.

• Categorizing mouth shapes visual phonemes (visemes)phoneme: the smallest distinctive unit of speech sound

• Why?– distinguish between confusing phonemes

(like f, s and m, n)– improve recognition performance in noisy

environments.

Page 5: Audio Visual Speech Recognition

DemoDemo

Ready for the challenge ?• Listen to this audio and try to understand

the speech content: vox_mix[1].mov• Listen to speech with video image:

dig_tranexp[1].mov• Did you understand the content? Get a

prize from IBM• Play the answer: vox_clean[1].mov

(5893642)

Page 6: Audio Visual Speech Recognition

How?How?

Audio Feature Extraction

Visual Feature Extraction

Audio-Visual Integration

Page 7: Audio Visual Speech Recognition

• Geometric lip dimensions– Lip shape:height/width of the inner/outer lip

• Visibility of the tongue/teeth

Visual FeaturesVisual Features

Page 8: Audio Visual Speech Recognition

AudioAudio--Visual IntegrationVisual Integration

• Feature Fusion

• Synchronization Problem

• Use low-resolution image

Page 9: Audio Visual Speech Recognition

SpectrogramsSpectrograms

• A Spectrogram:– Translation of speech into the visual

domain

frequency

time

Page 10: Audio Visual Speech Recognition

Spectrogram ReadingSpectrogram Reading

Waveform and Spectrogram of the word: "phonetician"

Page 11: Audio Visual Speech Recognition

Spectrogram FilteringSpectrogram Filtering

Required:

How?

Using:Morphological Processing

Page 12: Audio Visual Speech Recognition

Morphological ProcessingMorphological Processing

• Based on the theory of Mathematical Morphology ?!!

• Stresses the role of shape in image preprocessing used for region identification.

• Important Morphological operations:– Erosion– Dilation– Opening– Closing

Page 13: Audio Visual Speech Recognition

Erosion & DilationErosion & Dilation

• Erosion: the meaning– Used to shrink objects.

• Dilation: the meaning– Dual of erosion.– Used to fill small gaps or valleys between shapes

• Both are irreversible

Page 14: Audio Visual Speech Recognition

Opening & ClosingOpening & Closing

• Both used for smoothing an object contourcontour• Opening: Erosion followed by dilation

– smoothes from the inside of the object contour separate objects.

• Closing: Dilation followed by erosion– smoothes from the outside of the object contour fill in

small halls.

Erosion

Erosion

Dilation

Dilation

Page 15: Audio Visual Speech Recognition

Spectrogram FilteringSpectrogram Filtering

thresholdingconvert

dilation

erosion

ANDing

convert

Page 16: Audio Visual Speech Recognition